Cybersecurity experts have demonstrated how artificial intelligence can be weaponized to clone a person’s voice in real time — enabling highly convincing voice phishing (“vishing”) attacks that can trick employees into sharing sensitive information or taking unauthorized actions.

Researchers from the NCC Group revealed in a company blog that they conducted real-world tests using real-time voice cloning technology against organizations and successfully obtained confidential data.

“We’ve shown how these techniques can persuade people in key operational roles to perform actions on behalf of an attacker,” wrote researchers Pablo Alobera, Pablo López, and Víctor Lasa. “In simulated attack scenarios, we’ve managed to execute changes such as password resets and email address updates.”

How Real-Time Voice Cloning Works

When the NCC team began their project, they found that most existing deepfake voice technologies only worked with pre-recorded samples — meaning they couldn’t adapt or respond dynamically during a live call.

To overcome this limitation, the researchers created a system that routes an attacker’s microphone input through a machine learning (ML) model trained on a target’s voice. As the attacker speaks, the victim hears the cloned version — complete with tone and cadence that sound convincingly real.

This audio feed can be directed into common communication apps like Microsoft Teams or Google Meet, allowing attackers to impersonate trusted colleagues or executives during live calls.

To make their proof-of-concept more realistic, the researchers even spoofed the caller’s phone number (with consent), mimicking a familiar contact to increase the likelihood that targets would trust the call.

“The reality today is that the tools and infrastructure needed for real-time voice cloning are easily accessible — even to individuals or small groups with limited technical and financial resources,” the researchers wrote.

They noted that all their results were achieved with hardware and software that were simply “good enough,” not cutting-edge.

Why It’s a Game Changer for Scammers

Experts warn that real-time voice cloning significantly raises the stakes for vishing attacks.

“Victims rely on the caller’s number, voice, and message content — all of which can now be spoofed or cloned,” said Matthew Harris, Senior Product Manager for Fraud Protection at Crane Authentication. “This makes scams more believable and increases success rates.”

Brandon Kovacs, Senior Security Consultant at Bishop Fox, compared it to the difference between reading from a script and having a live conversation:

“Real-time voice conversion lets attackers respond, escalate authority, and handle questions on the fly — especially when combined with deepfake video on Zoom or Teams.”

T. Frank Downs of BlueVoyant called the technique a “force multiplier”:

“It allows attackers to adapt tone and context in real time, making the impersonation nearly impossible to detect during the call.”

Easier Than Ever to Use

While high-quality voice cloning used to require specialized skills, today’s generative AI tools are rapidly lowering the barrier.

“Some voices are harder to clone than others, but AI is improving fast,” said Roger Grimes, CISO Advisor at KnowBe4. “Modern models use probabilistic pattern matching to deliver far better results than older tools — though they still struggle with underrepresented voices and languages.”

Grimes predicts that by 2026, most voice-based social engineering attacks will be powered by AI:

“By then, most vishing calls won’t involve a real human voice. Social engineering is about to change forever.”

Deepfake Impersonations on the Rise

Even without real-time AI, fake CEO voice messages are already emerging as a serious threat.
“It only takes a short recording and a tool like ElevenLabs to produce a convincing fake,” said Alex Quilici, CEO of YouMail. “We expect these to become the next major attack vector.”

So far, such voice attacks remain relatively rare — but text-based impersonation scams are booming.

“We’re seeing widespread SMS campaigns pretending to be executives, urging employees to take immediate action,” Quilici noted.

Protecting Against AI-Driven Social Engineering

AI-powered impersonation attacks are intensifying, warned Marc Maiffret, CTO of BeyondTrust.

“The key defense is human vigilance and robust identity security,” he said. “Enforce least privilege, monitor identity infrastructure, and restrict access to sensitive accounts. Even if credentials are stolen, limit what attackers can do.”

He added that AI-based social engineering underscores a fundamental shift in cybersecurity:

“Identity is the new perimeter — and deepfakes prove why securing it matters more than ever.”

What’s Next

The NCC Group researchers are now turning their attention to deepfake video impersonations. Early results show challenges in synchronizing real-time audio and video, but progress is accelerating.

“Given the unprecedented speed of advancement,” they concluded, “a realistic real-time deepfake that synchronizes both voice and video is only a matter of time.”

Source