Vendor Governance for AI in Credentialing

Most credentialing organisations do not build their own AI. They consume it through platform vendors, proctoring partners, scoring services, item generation tools, and the embedded features that arrive inside products they already buy. The accountability for what that AI does, however, stays with the credential owner. When a candidate appeals, when a regulator asks, when an employer challenges a decision, the answer “our vendor does that” is not a defence. It is an admission that governance has not been done.

“Our vendor does that is not a defence. It is an admission that governance has not been done.”

This article is a practical guide to closing that gap. It lists the questions credentialing organisations should be asking in every AI-touching RFP, every renewal, and every vendor change review. Use it as the procurement checklist your team needs and the contractual position your legal team can negotiate from. It is written for credentialing leaders who already know vendor governance matters and who want a structured way to make it real.

Why vendor governance is the largest single area of exposure

There are three reasons vendor governance is where credentialing AI risk concentrates.

The first is volume. Most of the AI in your operation is operated by people who do not work for you. Their model choices, their training data, their update schedules, and their incident response practices all sit outside your direct control. A weak supplier introduces risk that no internal control can fully offset.

The second is opacity. Vendors are often reluctant to share the details of how their models work, how they were trained, or how they handle drift. Some of this is reasonable commercial confidentiality. Some of it is a problem dressed up as confidentiality. The difference matters, and the way to surface it is to ask specific questions and accept specific answers, not to accept generic assurances.

The third is change. AI systems evolve continuously. A vendor who passed your due diligence at procurement may be operating a materially different system six months later. Without contractual change control, that drift is invisible until something goes wrong.

The questions in this article are designed to address all three. They are not a substitute for legal advice. They are the operational substance that your legal team needs to translate into enforceable terms.

Category one: model and system documentation

Start with the basics. If a supplier cannot describe their AI clearly, they cannot govern it, and you cannot rely on it.

Ask the supplier to describe the AI system in plain language. What does it do, what does it not do, what is its intended purpose, and what are its known limitations? You are looking for an answer that an assessment design lead can read and understand, not a marketing brochure. If the description is vague or evasive, that is a finding.

Ask whether the system is built in-house or relies on third-party models. Many AI features are layered on top of foundation models from providers like OpenAI, Anthropic, Google, or open-source projects. You need to know which providers are in the chain because their model updates, terms of service changes, and incident histories affect your operation.

Ask for the technical documentation that supports the system. This is the documentation the EU AI Act expects for high-risk uses, including system description, intended purpose, validation results, performance characteristics, and known limitations. If your supplier does not have this in a form they can share with you, they are not yet operating to the standard regulators will expect, and that is a procurement signal.

Category two: validity, fairness, and bias monitoring

For any AI that influences scoring, proctoring, or credential decisions, the supplier needs to demonstrate that the system has been evaluated for the audiences your credential serves.

Ask what validity evidence exists for the system in your specific use context. Generic accuracy claims on a vendor benchmark are not enough. You need evidence that the system performs appropriately on candidates like yours, in the assessment environment you operate, on the constructs you measure. If the vendor’s evidence is generic, you need to commit to running your own evaluation before operational use.

Ask how subgroup performance is measured and reported. The supplier should be able to tell you which subgroups they monitor, what metrics they report, what thresholds trigger investigation, and how often the analysis is run. If they do not monitor by subgroup at all, that is a serious gap. If they monitor but cannot share the methodology, that is also a gap.

Ask what bias mitigation steps the supplier takes when issues are identified. The answer should describe a process: detection, root cause analysis, mitigation, validation, deployment, and ongoing monitoring. The absence of a process means bias issues are handled ad hoc, which is not a position you can rely on in an appeal.

Ask whether the supplier has had any documented cases of subgroup performance issues, and what the response was. A supplier who claims they have never had such an issue is either not monitoring, not deployed at scale, or not being candid. A supplier who can describe an issue, the root cause, and the corrective action is a supplier who is actually running governance.

Category three: change control and model updates

This is the category that most procurement processes miss, and it is the category where post-procurement risk concentrates.

Ask what counts as a material change to the AI system. The supplier should have a clear definition: model retraining, base model upgrade, training data changes, threshold adjustments, scoring rubric changes, new capability addition, or feature removal. If they cannot define material change, they cannot notify you of one.

Ask for written commitment that the supplier will notify you in advance of material changes that affect your use case, with a defined notice period. Thirty days is reasonable for most changes. Same-day notification is acceptable only for emergency security patches. Anything less than this means changes happen without your knowledge, and your governance trail breaks.

Ask whether you can defer or refuse a change. The supplier may not be able to maintain a separate version for you indefinitely, but you should have the right to assess the impact before adoption and to delay deployment in your environment for a defined window. This is especially important during exam cycles, where mid-cycle changes can introduce inconsistency that invalidates your validity argument.

Ask how change history is documented. You want to be able to point at a candidate decision and say which version of the system was operating when that decision was made. Without versioned change history, you cannot defend historical decisions when current behaviour differs.

Category four: logging, traceability, and audit access

Logs are the evidence base for everything else. A supplier who cannot give you access to logs cannot help you defend a decision.

Ask what is logged for every AI interaction in your use case. At minimum, you need: the input the system received, the output the system produced, the version of the system, the timestamp, the human reviewer where applicable, and any override reasons. For proctoring, you also need the flag rationale and the human review decision. For scoring, you need the score output and any intermediate confidence values.

Ask for how long logs are retained and how you access them. Retention should be at least as long as your appeals window plus a buffer. Access should be available through a defined process that does not require renegotiating the contract every time. Read-only access through a dashboard or API is the working standard. Logs that only the supplier can see are not useful to your governance.

Ask whether you can run independent audits or commission third-party audits. For high-stakes uses, this is a non-negotiable. The supplier may want notice and reasonable scope limits, both of which are fine. What is not fine is a blanket refusal of audit access, because that is the position that looks worst when something has gone wrong.

Category five: incident notification and response

Incidents will happen. Vendor governance is judged on how quickly you find out and how clearly the response runs.

Ask the supplier to define what counts as an incident affecting your use case. A reasonable definition includes: unexpected performance degradation, fairness issues identified in monitoring, security breaches, data leakage, model behaviour outside expected bounds, and any candidate-impacting bug. The list does not need to be exhaustive but the principles need to be clear.

Ask for the incident notification timeline. For incidents that have or could have affected your candidates, twenty-four hours is the reasonable maximum. For incidents that pose immediate candidate risk, immediate notification is the standard.

Ask what the incident response process looks like, including containment, candidate impact assessment, root cause analysis, corrective action, and post-incident review. You want a process you can verify, not a promise you cannot. Ask to see an example post-incident report from a previous incident, with appropriate redactions.

Category six: prohibited practices and regulatory alignment

This is the category that catches the issues that should never have been deployed in the first place.

Ask the supplier to confirm in writing that the system does not perform emotion recognition in any form. Under the EU AI Act, emotion recognition in educational and workplace contexts is a prohibited practice, and the obligation to avoid it sits with you whether or not the feature is enabled by default. Get the confirmation in writing, because it puts the burden of accuracy on the supplier.

Ask whether the supplier is aware of any features in the product that could be classified as high-risk under the EU AI Act, and how those features are governed. A supplier who has thought about this can answer it. A supplier who has not is going to learn the hard way, possibly using your operation as the test case.

Ask which standards the supplier aligns to. ISO 42001 is the AI management system standard. ISO 23894 is the AI risk management guidance. SOC 2 covers operational controls. GDPR or equivalent privacy regimes cover data handling. The supplier does not need to be certified against all of these, but they should be able to describe how they align and where they are in the certification journey if they have started.

Ask whether the supplier has third-party assurance reports they can share. SOC 2 reports, penetration test summaries, ISO certification scopes, and bias audit reports are all useful. The willingness to share is itself a signal.

Category seven: data governance and privacy

For credentialing AI, candidate data is the most sensitive material in the system. Vendor governance has to cover it.

Ask where candidate data is stored, processed, and transmitted. Specific countries and regions, not just “the cloud”. Data sovereignty matters for legal exposure and for candidate trust.

Ask what candidate data is used to train or improve the supplier’s models. The answer you want is “none of yours, unless we explicitly opt in to a contract that says so”. Some suppliers train on aggregated customer data by default, and that is a position credentialing organisations should generally refuse.

Ask about data deletion. When a candidate exercises their right to deletion, or when your contract ends, what happens to their data, the model artefacts derived from it, and the logs that reference it? The answer should be specific and contractually binding.

Ask what happens to candidate data in the event of a supplier acquisition, insolvency, or service discontinuation. This is the question nobody likes to answer, and it is the one that protects you when business circumstances change.

The seven RFP categories at a glance

Model and system documentation — plain-language description, model chain, technical documentation
Validity, fairness, and bias monitoring — context-specific evidence, subgroup methodology, mitigation process
Change control and model updates — material change definition, notification, deferral rights, version history
Logging, traceability, and audit access — full interaction logs, retention, independent audit rights
Incident notification and response — definition, 24-hour maximum, verifiable process, example reports
Prohibited practices and regulatory alignment — written emotion recognition exclusion, EU AI Act position, standards
Data governance and privacy — sovereignty, training data exclusion, deletion, business-continuity protections

What to do with the answers

The answers to these questions are not pass or fail in isolation. They are evidence of how the supplier operates. A supplier who answers most questions clearly and identifies their gaps honestly is more trustworthy than one who answers everything with reassuring vagueness. A supplier who refuses to answer or who obstructs the questions is telling you something important about the risk you are taking on.

For each RFP, build the answers into the procurement record and feed the high-stakes ones into your AI register. For each renewal, run the questions again and compare to the previous answers. Changes are interesting. Improvements are positive signals. Regression is a finding worth escalating.

The overall test is simple. Could you answer a regulator’s question about the AI in your credential by combining your own evidence with what your supplier provides? If yes, your vendor governance is working. If no, the gap is procurement work, not technology work.

The strategic position

Procurement teams sometimes treat these questions as a burden on suppliers. The opposite is true. Suppliers who handle these questions well are the suppliers who will be operating sustainably in the regulated AI environment that is now arriving. Suppliers who cannot are going to face the same questions from every customer eventually, and the ones who learn first will have a market advantage.

“Suppliers who handle these questions well are the suppliers who will be operating sustainably in the regulated AI environment that is now arriving.”

By asking these questions consistently, credentialing organisations help shape the standard the AI assessment market operates to. That is a stronger position than waiting for the standard to be imposed and then trying to retrofit suppliers who never expected it.

“If a supplier cannot meet a meaningful subset of these expectations, that is a procurement decision, not a negotiation.”

If a supplier cannot meet a meaningful subset of these expectations, that is a procurement decision, not a negotiation. Better to walk away early than to discover the gap during an appeal. The companion pieces on the construct decision and the Testing Standards walk through the upstream and downstream of this work.

Ready to put real questions into your AI vendor RFPs?

Talk to our team about how Globebyte can help you build a procurement checklist your legal team can negotiate from.

Explore our services

Vendor Governance for AI in Credentialing: The Questions to Ask in Every RFP