When Vibe-Coding Meets Real-World Vulnerabilities

In a packed session that blended live coding demonstrations with sobering security warnings, Jed Kafetz, CREST Council Member and Head of Commercial Innovation at Claranet UK, delivered a masterclass on why rapid AI development demands equally rigorous security testing. Speaking to an audience of public sector technology leaders, Kafetz demonstrated how applications that take mere minutes to build can harbour vulnerabilities affecting millions, a lesson McDonald's learnt the hard way just six weeks ago.

The 16-Minute Application That Exposed a Global Risk

Kafetz opened with a simple question: "Who here has done some vibe-coding before?" A smattering of hands went up across the theatre. For the uninitiated, vibe-coding refers to the ability to transform natural language prompts; "make me a project management suite", into fully functioning applications in 5-10 minutes using generative AI tools, work that would traditionally require three to four months of development.

To illustrate both the power and peril of this approach, Kafetz shared a video from the previous evening, showing him building a mock UK government recruitment portal whilst simultaneously watching television. The application, designed in the style of GOV.UK, took just 16 minutes to code and 10 minutes to test. It featured AI agents that parsed CVs, extracted relevant information, and matched candidates to suitable roles, impressive technology that, on the surface, solved a legitimate government challenge.

But appearances can be deceiving.

The Vulnerability Hidden in Plain Sight

As Kafetz demonstrated the application's functionality, uploading a CV, receiving automated profile summaries, applying for a "Director of Cyber Resilience" role, he drew the audience's attention to a seemingly innocuous detail: the number 14 appearing in the URL, despite only four job adverts being visible.

"From a hacker's perspective, I mean this vulnerability, or you can see this in every application you go on," he explained. "What happens if you change that 14 to three to four?"

He then posed a critical question to the room: "Should one government department be able to access another department's applications, their job descriptions and their people, the candidates?" The answer was obvious. But in his hastily built application, they could, a classic insecure direct object reference vulnerability that exposed all 15 test records.

The demonstration took a darker turn when Kafetz logged in using a developer account with the password "123456", gaining access to the entire system from the internet. It was a textbook case of how quickly enthusiasm for AI-powered innovation can outpace fundamental security practices.

The McDonald's Wake-Up Call

This wasn't merely a hypothetical scenario. Kafetz revealed that this exact vulnerability had been discovered approximately six weeks earlier in a real-world application used by a company "four times the size of the civil service" - McDonald's.

The fast-food giant's AI-powered hiring platform, McHire, had been targeted by security researchers who, unable to compromise the chatbot itself, found their way in through a default password on a developer account. From there, they discovered the same type of vulnerability Kafetz had just demonstrated. The number they found wasn't 14 - it was 62 million. The researchers reportedly accessed six applicant records before responsibly disclosing the issue.

"This shows the requirement for security testing and for accredited standards," Kafetz emphasised. "You need decent methodologies and decent accredited people in order to have your applications tested."

Understanding the LLM Security Model

Moving from demonstration to education, Kafetz introduced the audience to the LLM security model, drawn from Steve Wilson's "Developer's Playbook for LLMs", the same author behind the OWASP Top 10 for LLMs. The model places the large language model at the centre, surrounded by multiple interfaces: user inputs, text outputs, web search capabilities, training data, and more.

The critical insight, Kafetz stressed, is that between the LLM and each interface exists a security boundary. "When you're deploying these LLMs, you have to really consider there's a trust boundary," he said. For each interface, developers must ask:

Is the authentication correct?
Do we need logging and monitoring?
Is the LLM training on this data, potentially capturing personal information permanently?
Can we roll back the model if needed?
Are we tracking user movements appropriately?

"This is a really important point when it comes to deploying applications in the future," Kafetz warned, urging the audience to think carefully about each trust boundary rather than treating the LLM as a monolithic black box.

CREST: The Lion Mark for Cybersecurity

In the session's latter half, Kafetz pivoted to introduce CREST (Council of Registered Ethical Security Testers), the organisation he represents on its UK Council. Founded in 2006 with an initial focus on penetration testing, CREST has evolved into an international body providing certification, standards, and quality assurance across cybersecurity disciplines including incident response, security operations, threat intelligence, and red teaming.

"The best analogy I have for CREST is they are to cybersecurity what the Lion mark is for eggs," Kafetz explained, referencing the familiar British quality assurance symbol. Just as consumers trust the Lion mark when buying eggs, organisations should look for CREST certification when procuring cybersecurity services.

The organisation's growth has been substantial: it now boasts approximately 500 member organisations globally (250 in the UK), representing around 2,000 certified individuals. Its influence extends well beyond British shores, with CREST working alongside government bodies and regulators in Australia, America, Asia, Africa, and the Middle East, including national cybersecurity agencies in Thailand, Bahrain, and Qatar.

Of particular relevance to the public sector audience was CREST's 2014 alignment with the Bank of England to create the CBEST framework, a threat intelligence-led penetration testing approach that has since spawned similar frameworks internationally.

Practical Takeaways

When asked about the difference between CREST and CHECK, the UK government's own IT health check scheme, Kafetz clarified that whilst CHECK is a methodology for accessing the Public Service Network, CREST is the broader professional body that supports regulators, creates examinations, and provides accredited individuals who can deliver CHECK assessments among other services.

For public sector organisations looking to procure security testing services, Kafetz directed attendees to crest-approved.org, where they can use the "Find a supplier" directory to filter by discipline, regional speciality, certifications, and organisational size. "You can find an organisation that suits your needs and you can reach out to them. There's phone numbers and email addresses," he said.

His core message for the DigiGov community was threefold:

Never underestimate the importance of security testing for AI applications and infrastructure, regardless of how quickly they can be built

Always consider the security model of an LLM before deployment, thinking carefully about trust boundaries between the model and each interface

Leverage accredited standards and certified professionals through organisations like CREST when procuring security services

The Speed-Security Paradox

As the session concluded, the central tension was clear: generative AI has democratised application development to an unprecedented degree, but it has not democratised security expertise. The same tools that enable a commercial innovation lead to build a recruitment portal whilst watching television can just as easily produce vulnerabilities affecting millions.

For the public sector, where data protection obligations are stringent and public trust is paramount, the message was unambiguous. The speed of innovation must be matched by the rigour of security testing. As Kafetz's McDonald's example demonstrated, even organisations with substantial resources can fall victim to basic security oversights when AI-powered development outpaces established security practices.

The future may be written in natural language prompts, but it still needs to be thoroughly tested by humans who know what they're looking for.

Liuba Pignataro

When Vibe-Coding Meets Real-World Vulnerabilities

The 16-Minute Application That Exposed a Global Risk

The Vulnerability Hidden in Plain Sight

The McDonald's Wake-Up Call

Understanding the LLM Security Model

CREST: The Lion Mark for Cybersecurity

Practical Takeaways

The Speed-Security Paradox

Related Events

DigiGov Expo

Related Articles

Maximising AI Benefits While Building Trust

Redesigning Government: Agentic AI and the Future of Public Services

Moving Beyond Proof of Concept: The FCA's Journey to AI Maturity

Subscribe to our monthly newsletter