Sequentur Blog
Helping you stay ahead of IT challenges
Real-world IT knowledge from engineers solving problems every day.
Practical IT knowledge for businesses that can’t afford downtime
Voice cloning and deepfakes: the new business fraud you need to know about
A bookkeeper at a mid-sized firm gets a call. The number on the screen is the owner’s mobile. The voice is the owner’s voice – the same cadence, the same way he clears his throat before getting to the point. He is short on time, between meetings, and needs a payment pushed to a new vendor before end of day. He will explain later. The bookkeeper has worked with him for six years. She recognizes the voice without a flicker of doubt, because it is, by every signal a human being has ever used to recognize another human being, his voice.
It is not his voice. It is a clone, generated from thirty seconds of a podcast he appeared on last year, and the call is the entire attack. There is no malware, no phishing link, no compromised account. The fraud is built on one assumption that used to be completely safe and no longer is: that if you recognize someone’s voice, you are talking to that person.
This is the part of the AI threat picture that SMB owners most often wave off as something that happens to other, bigger companies. It is not. Voice cloning is cheap, fast, and good enough today. Deepfake video is close behind. This article is the dedicated treatment that the rest of the Sequentur AI cluster points to – the AI-powered phishing article handles the email side of impersonation fraud, the how cybercriminals are using AI article walks the full attacker playbook, and this piece goes deep on the audio and video side: how the attacks actually work, the real fraud patterns, why small businesses are squarely in the blast radius, and the verification protocols that stop it. It is written for SMB owners, finance and operations staff, and in-house IT generalists who need to take this seriously before it is their incident.
Short answer
Voice cloning and deepfake fraud work by defeating the oldest identity check there is: recognizing a familiar voice or face. An attacker can clone a usable version of someone’s voice from under a minute of public audio – a podcast, a webinar, a conference talk, a social video, even a voicemail greeting – and use it on a live phone call to impersonate an owner, a CFO, or a vendor. Deepfake video is a step behind but already good enough to sustain a short, low-scrutiny video call. Small businesses are the ideal target because the controls that blunt this at large companies – segregated finance duties, layered payment approvals, formal verification workflows – are exactly the controls a 20-person business tends not to have. The defense is not a detection tool, because detection is unreliable and getting harder. It is a procedural one: any money movement, any change to banking or payment details, and any identity-sensitive request gets verified through a separate, pre-agreed channel before anyone acts, regardless of how certain the staff member is that they recognized the person. Recognizing the voice or the face is no longer evidence of anything. The rest of this article covers how the attacks work, the real fraud patterns, and the exact protocols to put in place.
Voice cloning and deepfakes at a glance
| Question | The short version |
|---|---|
| How much source audio does a voice clone need? | Under a minute is usually enough for a casual-context clone; a few minutes produces a better one |
| Where do attackers get the audio? | Podcasts, webinars, conference talks, YouTube, social video, company website videos, voicemail greetings |
| Can a clone hold a live phone call? | Yes – near-real-time voice cloning is good enough for a short, urgent call where the recipient is not scrutinizing it |
| Is deepfake video a real SMB threat yet? | Yes for short, low-scrutiny calls and recorded clips; sustained high-scrutiny video is still harder but closing fast |
| Who gets targeted? | Finance and AP staff, bookkeepers, executive assistants, HR, IT support – anyone who can move money or grant access |
| What is the attacker after? | Wire transfers, changed banking or ACH details, gift cards, credentials, access, or records |
| Does caller ID help? | No – caller ID is trivially spoofed and was never authentication |
| What actually stops it? | Out-of-band verification on a pre-agreed channel, a code word for finance, and treating any single channel as fakeable |
| Is this a future concern? | No – voice cloning fraud is already in FBI advisories and SMB incident reports today |
| Does it need an AI tool to defend against it? | No – the defense is procedural, cheap, and mostly free |
The rest of the article walks the mechanics, the fraud patterns, and the protocols in detail.
How voice cloning actually works
It helps to be precise about the mechanics, because the precision is what makes the threat believable rather than abstract.
Modern voice cloning tools build a synthetic model of a person’s voice from a sample of their real speech. The technology improved sharply over the last few years on two fronts at once. The first is how little source audio it needs – early tools wanted hours of clean recordings; current ones produce a usable casual-context clone from well under a minute, and a convincing one from a few minutes. The second is speed – cloning moved from a slow, offline rendering job to something fast enough to drive a near-real-time conversation, where the attacker types or speaks and the target hears the cloned voice with little lag.
The source audio is the part SMB owners underestimate. You do not need to be famous to have your voice on the public internet. A single podcast guest spot, a recorded webinar, a conference or industry-panel talk, a “meet the team” video on the company website, a YouTube walkthrough, a local news interview, a social media clip – any of these is enough. Even an outgoing voicemail greeting is a clean, isolated sample of exactly the right kind. Business owners, in particular, tend to have more public audio than their staff, because marketing themselves is part of running the company. The more visible the leader, the easier the leader is to clone. That is an uncomfortable inversion of the usual instinct that visibility is good for business.
Once the attacker has the clone, the attack is a phone call. The cloned voice asks for something – a payment, a banking change, a gift card run, a password reset – usually with a built-in reason the recipient cannot easily verify in the moment (“I am about to walk into a meeting”, “I am traveling”, “the deal closes today”). The urgency is not a coincidence; it is engineered to push the target to act before the slow, careful part of their brain catches up. And because the voice is right, the target’s instinct is not suspicion. It is helpfulness.
How deepfake video fits in
Video is the next layer, and it is worth being measured about where it actually stands rather than alarmist.
Deepfake video – a synthetic or face-swapped video of a real person – has reached the point where it can convincingly sustain a short, low-scrutiny interaction. A brief video call where the “executive” confirms a transaction, a recorded clip dropped into a Teams or Slack channel, a video message that adds a face to a request that would otherwise be just a voice. Real-time, high-quality video deepfakes that survive a long call with someone actively scrutinizing them are harder, but the gap is closing every quarter, and the low-scrutiny version is already good enough to cause losses.
The business contexts where deepfake video is already a live concern:
- Remote hiring fraud. The person who interviews on video is not the person who shows up to work – or who gets access to systems on day one. This has already hit SMB technical hiring, where remote roles and remote onboarding make the in-person check easy to skip.
- Vendor and partner onboarding. A video call used to “verify” the identity of a new business relationship is only as good as the assumption that the face on the call is real.
- High-value deal confirmation. An executive asked to confirm a large transaction over a quick video call, where the face and voice together create a sense of certainty that neither would alone.
- Compromised-account video. A genuinely compromised executive account used to post a short, manipulated video clip into an internal channel, carrying far more authority than a text message.
The combination that should worry an SMB most is voice plus video together. A request that arrives with both a familiar voice and a familiar face feels verified in a way that defeats the casual skepticism a text-only message might trigger. The defense, fortunately, does not change – it is the same out-of-band verification regardless of how many channels the attacker faked.
The fraud patterns small businesses actually see
The abstract threat becomes concrete in a small number of repeating patterns. These are the ones showing up in SMB incident reports and FBI public advisories.
The executive wire-fraud call. A finance staff member, bookkeeper, or executive assistant gets a call in the owner’s or CFO’s voice asking for an urgent wire or a payment to a new account. The reason is always something that makes a callback feel awkward – the executive is traveling, in a meeting, closing a deal. This is the single highest-loss pattern because it goes straight to money movement with no intermediate step.
The vendor banking-change call. Accounts payable gets a call from the “controller” or “billing contact” at a known, trusted supplier saying the supplier’s bank has changed and giving new ACH or wire details for upcoming invoices. Because the next real invoice is genuinely expected, the fraudulent banking change slots neatly into a routine the AP clerk already trusts. The loss surfaces weeks later when the real vendor asks where their payment went.
The IT-support pretext call. An IT helpdesk contact – internal or at an MSP – gets a call in a known employee’s voice asking for a password reset, an MFA re-enrollment, or a session unlock because they are “locked out before a client meeting.” The payoff here is not money directly; it is account access that leads to everything else.
The records-release call. A receptionist or administrator gets a call in a known professional’s voice – a referring physician, a partner, a client – asking for records, files, or documents to be sent to a new address or number. For a healthcare practice, this is a HIPAA breach in the making.
The remote-hire that is not real. A candidate interviews well on video, gets hired into a remote role, and either never does the work, or – the more damaging version – uses day-one access to systems and data for theft or fraud. The fake face on the interview call was the whole point.
The thread running through all of these: the attacker is not breaking a system. They are exploiting the gap between recognizing a person and verifying a person, and small businesses live in that gap because they rarely have a process that forces the second step.
Why small businesses are the easy target
There is a specific, structural reason this fraud works better against a 25-person company than a 2,500-person one, and it is worth stating plainly because it cuts against the “we are too small to be worth it” instinct.
Large companies are not safer because their staff are smarter. They are safer because their processes make a single tricked employee insufficient. A large company’s finance function has segregation of duties – the person who can initiate a payment cannot also approve it. It has payment thresholds that force multiple sign-offs above a certain amount. It has formal vendor-banking-change procedures that route through a verification step nobody can skip. A cloned voice that fools one person runs straight into a second and third control.
A small business usually has none of that. It often has one bookkeeper, or one office manager, who can and will act on a direct instruction from the owner because that is how a small, trusting, fast-moving company runs. The flat structure and the personal trust that make a small business efficient are the exact properties voice-cloning fraud is built to exploit. There is no second control to run into.
The other half of it is the economics, covered in depth in the how cybercriminals are using AI article: cloning a voice and placing a call used to carry enough cost and effort that attackers reserved it for big targets. AI collapsed that cost. Running the executive-wire-fraud call against a hundred small businesses is now cheap enough to be worth doing, and small businesses are no longer below the threshold of attacker attention. “Too small to target” was true when the attack was expensive. The attack is no longer expensive.
The defense: verification protocols that actually work
The instinct when faced with a new AI threat is to look for an AI tool that detects it. For voice cloning and deepfakes, that instinct is wrong, and it is important to say so directly. Detection tools for synthetic audio and video exist, they are improving, and they are useful as one layer – but they are an arms race the defender does not reliably win, and a small business should not build its protection on the assumption that it can tell a clone from the real thing. The durable defense does not depend on detecting the fake at all.
The durable defense is a verification protocol: a small set of rules that fire on the type of request, not on whether the request seemed suspicious. Because the protocol does not require anyone to spot the fake, it works even when the fake is perfect.
The core rule – out-of-band verification on money and access. Any payment, any wire, any change to banking or ACH details, any gift card request, and any sensitive access or records request gets verified through a second, pre-agreed channel before anyone acts. If the request came by phone, verification happens through a different channel – a callback to a known number from your own records (never a number supplied during the suspicious call), a message in an established internal channel, or in person. The request and the verification must travel over two different channels, because an attacker who controls one channel does not control both.
No exceptions for urgency. This is the rule that makes or breaks the protocol. Every one of these attacks is wrapped in a reason the verification is inconvenient right now. The protocol has to treat urgency as a reason for more caution, not an excuse to skip the step. The way to make that stick is to remove it from individual judgment: the staff member does not decide whether this particular call feels legitimate, because the rule fires on the request type. A wire request triggers verification. Always. Full stop.
A code word for executive-to-finance authorization. A simple, low-tech, highly effective addition: the owner or executives and the finance team agree on a private verification phrase, known only to them and never spoken in public, in marketing, or in any recorded setting. Any verbal authorization for money movement has to include the code word. A voice clone can reproduce the executive’s voice perfectly and still has no way to know the phrase. It costs nothing and it defeats the entire executive-wire-fraud pattern.
Caller ID and recognition are not authentication. Staff need to internalize two things as flat facts. Caller ID is trivially spoofed and has never been proof of anything. And recognizing a voice or a face is, as of now, no longer proof either. Identity is established by the verification step, not by recognition. This is a genuine mental-model shift and it has to be taught explicitly, because every instinct people have runs the other way.
Verification of new identities, not just new requests. For remote hiring and vendor onboarding, the protocol extends to confirming that the person is who they claim to be through a channel the attacker cannot easily fake – a known reference, a verified document check, a step that does not rely on the video call itself. The video call is not the verification.
A blameless, fast reporting path. When something feels off, or when a staff member followed a request and only afterward grew uneasy, reporting has to be immediate and carry zero risk of blame. Speed of reporting is what limits the loss. An employee who fears looking foolish, or fears punishment for a mistake already made, will stay quiet – and silence is exactly what the attacker is counting on.
The whole protocol fits on one page. It costs almost nothing to implement. And it is worth being honest about the contrast: the loss from a single successful executive-wire-fraud call is frequently a five- or six-figure transfer that is never recovered, because by the time the fraud surfaces the money has moved through accounts and is gone. A one-page procedure against a six-figure loss is not a hard business case.
Where this fits with your other defenses
Voice cloning and deepfake fraud should not be treated as a standalone problem with a standalone fix. It is one face of the same impersonation threat that AI-generated phishing represents on the email side, and the verification habit that defeats it is the same habit that defeats AI-enhanced business email compromise.
That is the useful insight for an SMB with limited time and budget: you are not building three separate defenses for AI phishing, BEC, and voice cloning. You are building one verification discipline – money movement and sensitive access always get a second-channel check – and it covers all three at once. The AI-powered phishing article covers the email-side signals and controls, and the AI security risks overview places voice and video fraud alongside the other risks an SMB carries. Standard email security still matters here too, because many of these calls are preceded or followed by a phishing email, and the phishing attack prevention article covers that layer.
The protocol also belongs inside, not alongside, your broader security program. If your business runs a managed cybersecurity services engagement, the verification rules, the staff training, and the incident response for a successful fraud all belong in it – and an AI-aware AI governance framework is where the policy lives and gets kept current as the technology moves.
10 things small businesses get wrong about voice cloning and deepfakes
The recurring misconceptions and gaps:
- “We are too small to be worth a cloned-voice attack.” True when the attack was expensive. AI made it cheap enough to run against small businesses at scale.
- Treating a recognized voice as proof of identity. Recognizing a voice is no longer evidence. A clone needs under a minute of public audio.
- Trusting caller ID. Caller ID is trivially spoofed and was never authentication. The familiar number means nothing.
- Assuming you need a detection tool. Detection is an arms race. The durable defense is a verification protocol that works without detecting the fake.
- No out-of-band verification on money movement. The single highest-value control – and the one most SMBs do not have.
- Allowing urgency to override the process. Every one of these attacks is wrapped in urgency. The protocol has to treat urgency as a reason for more caution, not less.
- No code word for executive-to-finance authorization. A free, low-tech control that defeats the entire executive-wire-fraud pattern.
- Skipping identity verification in remote hiring. A deepfake interview is now a real fraud vector. The video call is not the verification.
- Underestimating how much public audio the owner has. The most visible person in the company is the easiest to clone. Marketing visibility is cloning material.
- Punishing the employee who got fooled. Fear of blame produces silence, and silence lets the loss grow. Reporting must be fast and blameless.
Time to put voice and deepfake fraud protections in place
A practical sequence for a typical 20-50 person SMB closing the voice-cloning and deepfake gap:
| Phase | What happens | Time |
|---|---|---|
| Out-of-band verification rule | Write the rule for payments, banking changes, gift cards, and sensitive access. Put it on one page. | 2-3 days |
| Code word setup | Agree a private verification phrase between executives and finance. Brief only the people who need it. | 1 day |
| Finance and AP briefing | Walk finance, AP, bookkeeping, and executive assistants through the patterns and the protocol | Half a day |
| All-staff awareness session | 10-15 minute briefing: voice and face are no longer proof, here is the protocol, here is how to report | 1 week to schedule and deliver |
| Hiring and vendor onboarding update | Add identity-verification steps that do not rely on the video call itself | 3-5 days |
| Reporting path | Confirm a fast, blameless way to report a suspicious call or a request already acted on | 2-3 days |
| Incident response check | Confirm the response plan covers a successful fraud – bank contact, law enforcement, internal escalation | 3-5 days |
| Total elapsed time | From “we should look at this” to a working set of protections | 2-3 weeks |
Almost all of this is procedure and communication, not technology or spend. The two-week version is a focused effort by an owner who wants it done. The three-week version accounts for scheduling the all-staff session around a busy calendar. There is no expensive tool on this list, which is the point – the defense against a six-figure fraud risk is mostly a page of rules and a short conversation.
What is next in this content series
This article closes the threat-side run of the Sequentur AI cluster – shadow AI, AI policy, data privacy, AI and HIPAA, AI security risks, Copilot, AI governance, how criminals use AI, AI phishing, and now voice cloning and deepfakes. The remaining pieces shift to the practical, build-it-right side:
- How to evaluate whether an AI tool is safe for your business to use – the questions to ask before approving any AI vendor
- How to introduce AI tools to your team without creating security gaps – the controlled-rollout playbook
- AI and data privacy laws – what small businesses need to understand as state AI rules take shape
- AI tools for small business IT – how MSPs use AI to support clients better
- What to do if an employee leaks business data through an AI tool – the incident response walkthrough
If you have not read them yet, the most directly related pieces are the AI-powered phishing article on the email side of impersonation fraud, the how cybercriminals are using AI attacker-playbook overview, and the AI security risks catalog. The series also opens with the shadow AI wake-up call, the AI acceptable use policy template, and the AI governance framework article where the verification protocol belongs as written policy.
How Sequentur can help
If you want help writing the verification protocol, briefing your finance and AP staff, updating hiring and vendor onboarding, or folding this into a broader security program, schedule a call.
Get the Best IT Support
Schedule a 15-minute call to see if we’re the right partner for your success.
Testimonials
What Our Clients Say
Here is why you are going to love working with Sequentur