If your defence against AI misuse in credentialing assessments is an AI detection tool, you are building on sand. Detection tools have known accuracy problems, generate fairness risks of their own, and cannot survive scrutiny in an appeal. They are not a strategy. They are at best a triage signal, and at worst a liability.
The better approach exists, and it is published. Ofqual, the qualifications regulator for England, and the Joint Council for Qualifications (JCQ) have produced the clearest operational guidance available on AI in assessments. This article translates that guidance into what credentialing leaders should be doing, with or without a UK presence, and explains why design-led integrity is the only approach that holds up at scale.
“If your defence against AI misuse is an AI detection tool, you are building on sand.”
The problem with detection-led integrity
AI writing detection tools were marketed as the answer to the authenticity crisis when generative AI first arrived. The market has since corrected. OpenAI withdrew its own classifier citing low accuracy. Turnitin has published candid analysis of false positive rates in its detector. Independent research has consistently shown that these tools struggle with non-native English writers, with edited AI output, and with anything more sophisticated than a basic copy and paste.
The problem is structural, not solvable by better tools. Generative AI models are designed to produce text that resembles human writing. Detection models are designed to spot the difference. The race is permanent, and the detector is permanently behind.
For credentialing bodies, the implication is that any process that converts a detector flag into a misconduct outcome is exposed. A determined candidate can defeat the detector. An innocent candidate can be wrongly flagged, particularly if they fall into a subgroup the detector handles poorly. Either way, the process fails. The credible defence against AI misuse cannot rest on the tool. It has to rest on the way the assessment is designed and verified.
Why Ofqual and JCQ matter beyond the UK
Ofqual is the independent regulator of qualifications in England, with oversight that includes apprenticeship end-point assessments. JCQ is the Joint Council for Qualifications, which issues operational guidance used across the UK qualifications ecosystem, particularly on malpractice and integrity. Together they produce some of the most detailed and operationally usable AI guidance in any jurisdiction.
For credentialing organisations outside the UK, the question is whether this guidance is relevant. The answer is yes, for two reasons. First, Ofqual recognises cross-border dynamics and participates in international AI standards discussions. Second, the practical guidance JCQ publishes is independent of UK qualification structures. The principles transfer cleanly to professional credentialing, micro-credentials, and certification programmes anywhere.
These publications are credible signal documents for where the field is moving. Credentialing leaders who track them can stay ahead of expectations that will eventually surface in their own jurisdictions.
Three rules that come through clearly
Across the Ofqual and JCQ guidance, three rules consistently apply.
The first rule is that authenticity is non-negotiable. Candidate work must be the candidate’s own. If AI-generated content is reproduced in a submission, it must be identified as such, and it does not demonstrate independent competence unless the use of AI is itself part of what is being assessed. Misrepresentation is malpractice. The clarity of this position matters because it shifts the burden away from detection and onto disclosure.
The second rule is that AI can support marking, but should not be the sole decision-maker in high-stakes contexts. Human judgement remains central. Any evidence that supports an AI marking approach has to be specific to the qualification, the construct, and the population. Generic vendor claims do not satisfy this expectation.
The third rule is that AI proctoring and anomaly detection must include human oversight and a clear path to challenge. Every AI flag has to be reviewable, explainable, and appealable. This is the rule that closes the door on detector-only enforcement. If a flag cannot be reviewed by a human and challenged by the candidate, it cannot be the basis for a decision.
The three rules from Ofqual and JCQ
- Authenticity is non-negotiable. Candidate work must be the candidate’s own, with AI use disclosed where permitted.
- AI can support marking but cannot decide alone. Human judgement stays central in high-stakes contexts.
- Every flag must be reviewable, explainable, and appealable. Detector-only enforcement does not meet the bar.
Detection theatre, and what to use detection for
The phrase that captures the dysfunction here is detection theatre. It looks like the organisation is doing something to protect integrity. It generates reports. It creates the illusion of control. But the underlying process is brittle, and when challenged, it does not hold.
“The race is permanent, and the detector is permanently behind.”
Detection tools have a role, but it is a narrow one. They are useful as a triage signal, helping a small team focus its attention on cases that warrant a closer look. They are not evidence in themselves. The decision standard for misconduct should never be a detector score alone, no matter how confident the tool reports itself to be.
The best practice is to treat detector outputs the same way you would treat any other anomaly signal. They identify a question worth investigating. They do not answer it. The investigation, and the evidence, has to come from elsewhere.
Design integrity in: secure conditions, vivas, and evidence trails
The robust alternative to detection-led integrity is design-led integrity. The principle is that the assessment is designed in such a way that AI misuse either cannot happen or cannot succeed without leaving evidence.
For secure assessments, this means controlled conditions that genuinely prevent unauthorised AI access. Test centres, tightly proctored remote sessions with appropriate device controls, and item types that require application and reasoning rather than pattern recall. The construct being measured can be defined as unaided competence, and the conditions enforce that definition.
For take-home and portfolio assessments, the answer is verification. Mandatory AI disclosure as part of the submission, oral defence or structured viva for medium and high-stakes work, and evidence trails such as drafts, change history, prompts, and references that demonstrate the candidate’s process. A short verification interview tied to a sample of submissions catches more genuine misuse than any detector and does so in a way that is defensible in an appeal.
For portfolios and workplace evidence, supervisor attestation, spot audits, and structured interviews provide the verification layer. The candidate’s process is the evidence, not just the artefact.
This is more work than running a detector across submissions. It is also work that survives scrutiny. The investment is in process design once, not in defending shaky decisions case by case.
Publish a candidate AI policy that you can actually enforce
You cannot enforce a policy candidates do not know. The first concrete action is to publish a clear candidate AI policy that tells candidates what is allowed, what is prohibited, and what they have to disclose, broken down by assessment type.
A workable policy distinguishes between three positions:
- AI use is banned for this assessment, because the construct measures unaided competence
- AI use is permitted with disclosure, because the construct allows tool-supported work
- AI use is required and assessed, because the construct measures responsible use of AI in professional practice
The same candidate may sit different components under different positions. That is acceptable as long as the rules for each are explicit. The policy should also explain what happens if rules are breached and what process the candidate can expect.
When candidates know the rules and the consequences, the integrity baseline lifts on its own. Most misconduct is not malicious. It is candidates guessing in the absence of clear guidance.
Misconduct categories and proportionate response
Ofqual and JCQ both expect proportionate, evidence-based handling of misconduct. A single tier of “you cheated” is not enough. A workable framework distinguishes three tiers.
Tier one is administrative breach. Incomplete disclosure on a permitted tool, or a minor procedural error, where the work is still demonstrably the candidate’s own. The proportionate outcome is corrective action, a warning, or candidate education.
Tier two is substantive misrepresentation. Undisclosed AI generation of meaningful content, or other misuse that materially affects the evidence of competence. The outcome is component fail or attempt invalidation, with a formal record.
Tier three is fraud or security breach. Impersonation, organised cheating, or leaking secure materials. The outcome is disqualification, a ban period, and reporting where required.
This tiered approach is fairer to candidates and more defensible in appeals. It also gives the organisation a structure for consistent handling, which is itself a key requirement under both Ofqual and JCQ guidance.
Three actions to take this quarter
The work is operational, and it is achievable in a single quarter for most credentialing programmes.
First, publish your candidate AI policy. Define what is allowed and prohibited by assessment type, with clear disclosure rules. Make it visible to candidates before they sit any component.
Second, lock in human authority at every key decision point. No AI-only decisions on scoring, no AI-only decisions on integrity flags, no AI-only credential awards. Document the people authorised to override AI outputs and what training they have. Build the appeals route that goes with it.
Third, enforce through design. Audit each high-stakes assessment for the conditions and verification mechanisms that protect its construct. Add vivas or structured follow-up for take-home work. Strengthen secure conditions where the credential certifies unaided competence. Update malpractice and appeals policies to reflect AI-specific scenarios.
If a detector is part of the operating model after this work, it should be there as a triage signal, not as a verdict.
Three actions this quarter
- Publish your candidate AI policy. Define allowed, prohibited, and required uses by assessment type, with disclosure rules visible before any component.
- Lock in human authority at every key decision point. No AI-only scoring decisions, integrity flags, or credential awards.
- Enforce through design. Add vivas or structured follow-up for take-home work. Strengthen secure conditions where unaided competence is the construct.
Credibility is designed in, not detected
The temptation to reach for a detection tool is understandable. It feels like a fast answer to a hard problem. The problem is that the answer does not work, and the time spent defending it is time not spent building the integrity framework that does.
“Credibility is designed in, not detected.”
Ofqual and JCQ have made the pattern explicit. Authenticity is the obligation. Human authority is the safeguard. Design and verification are the enforcement mechanism. Detection has a narrow supporting role.
Credentialing organisations that build to that pattern will protect their credibility through whatever the next generation of AI tools brings. The ones that bet on detection will be replacing their approach again in twelve months, under pressure, in public, with a candidate appeal underway. The work to do this properly is available now. The advantage goes to whoever moves first. The companion articles on the EU AI Act, ISO 42001 and 23894, and the AERA, APA, and NCME Testing Standards cover the regulatory, governance, and validity strands of the same picture.
Ready to replace detection theatre with design-led assessment integrity?
Talk to our team about how Globebyte can help you build a credentialing integrity framework that survives an appeal.