Sequentur Blog
Helping you stay ahead of IT challenges
Real-world IT knowledge from engineers solving problems every day.
Practical IT knowledge for businesses that can’t afford downtime
What data are you feeding into AI tools, and why it matters
Most small businesses jumped straight from “AI is interesting” to “AI is in use” without the step in the middle – the one where someone asks what actually happens to the data after staff press send. Every time an employee pastes an email draft, a customer name, a contract clause, a spreadsheet row, a stack trace, or a meeting transcript into ChatGPT, Copilot, Gemini, or Claude, that information leaves your business. Where it goes, how long it lives there, who can see it, and whether it is used to train future models depends entirely on which version of the tool the employee is using and which tier you are paying for. Most owners do not know which version their staff are using. Most staff do not know either.
This article is the data-side companion to the shadow AI wake-up call and the AI acceptable use policy template. Before you set a policy, you have to understand what each AI vendor actually does with your inputs in their free, paid-consumer, business, and enterprise tiers – because the gap between “ChatGPT” and “ChatGPT” is not cosmetic. It is the difference between “the vendor reserves the right to train on your data” and “the vendor contractually agrees not to retain or train on your data.” Those are two different products with the same name.
It is written for SMB owners, operations managers, in-house IT generalists, and the compliance-anxious person at the conference table who keeps asking “but is that actually safe?” If you handle PHI, PII, payment data, regulated financial data, client data under NDA, or anything covered by a confidentiality clause, this is foundational reading. If you handle nothing regulated and just want to know whether your employees are quietly leaking the customer list, this is still foundational reading.
Short answer
Large language models do not have a single uniform privacy posture – each AI tool is a family of products with different data terms per tier. As a rule of thumb: free and personal-account tiers tend to retain inputs and may use them to train future models; paid Team, Business, and Enterprise tiers tend not to. What changes per vendor is the specific retention window, opt-out controls, regional differences, what “data” means in the terms (chat history vs prompts vs uploads), and whether a Business Associate Agreement is available for regulated industries. The categories of data that should never enter an unmanaged consumer AI tool, regardless of vendor: personally identifiable information (PII), protected health information (PHI), payment data, regulated financial records, client information under confidentiality, source code with embedded secrets, employee personnel data, and internal strategy that would harm the business if it leaked. The fastest workable move is to classify your data into three tiers, decide which AI tools and which licensing tier each tier is allowed in, and write the rule down. The most expensive mistake is assuming every tier of every AI tool behaves the same way – they emphatically do not.
AI data privacy at a glance
| Question | Short answer |
|---|---|
| Does ChatGPT train on my data? | The free and Plus tiers historically have, with opt-out available. ChatGPT Team, Enterprise, and the API do not train on inputs by default. |
| Is Copilot safe for business data? | Copilot for Microsoft 365 (the licensed product, not the free consumer Copilot) operates under M365 enterprise data protection terms. Free Copilot does not. |
| What about Gemini? | Gemini in the free Google account is treated under consumer terms with human review possible. Gemini for Google Workspace is treated under Workspace data protection terms. |
| What about Claude? | Claude.ai consumer has training controls that vary. Claude for Work and the Anthropic API are designed to not train on customer inputs by default. |
| Is there a BAA available for HIPAA? | Yes for Microsoft Copilot for M365 with appropriate licensing, the Anthropic API, the OpenAI API and ChatGPT Enterprise, and Google Gemini for Workspace with the right add-on. Always verify vendor terms at the time you sign. |
| What data should never go into any consumer AI tool? | PHI, PII, payment data, regulated financial data, source code with secrets, client data under NDA, confidential strategy, employee personnel data |
| How long is data retained? | Varies by vendor and tier – days for some enterprise products, indefinitely or “as long as the account exists” for some consumer products |
| Where does my data physically go? | To the vendor’s cloud infrastructure, usually US-based, sometimes EU-routed for European customers. Check the vendor’s regional data terms. |
| What is the cheapest first step? | Write a three-tier data classification rule, map each tier to allowed AI tools and tiers, and tell staff |
| What is the biggest misconception? | “ChatGPT is ChatGPT” – it is not. The free, Plus, Team, Enterprise, and API products have meaningfully different data terms. |
How large language models actually handle your inputs
Before going vendor by vendor, it helps to understand what happens to an AI prompt mechanically. Every prompt you send to a large language model goes through roughly the same lifecycle, though the specifics differ between products.
Step 1: transmission. Your prompt – the text you typed, any documents you attached, the conversation history – is sent over an encrypted connection to the vendor’s servers. This is no different from any other SaaS request.
Step 2: processing. The vendor’s infrastructure runs your prompt through the model. The model itself is read-only at inference time; it does not learn from a single request on the spot. What it produces is a response, sent back to you, also over encryption.
Step 3: logging. This is where the products diverge. Most AI products log prompts and responses for some period for safety review, abuse detection, support, and debugging. The default retention period, who has access, and whether the logs feed into anything downstream is the part that varies by tier.
Step 4: retention. Conversations are typically saved to your account so you can return to them. How long they live, whether they live indefinitely, and whether you can delete them is set by the tier and the vendor.
Step 5: model training (this is the part most people miss). Some products use customer prompts to improve future model versions. Free and consumer tiers historically have. Enterprise tiers contractually do not. The training is rarely real-time and rarely targeted at one customer – it is aggregated, sampled, and curated. But “rarely real-time” and “contractually do not retain” are very different statements, and only one of them is enforceable in a vendor contract.
Step 6: human review (the part even fewer people know about). Most AI vendors have human reviewers who look at flagged or sampled conversations to improve the model and check for misuse. Free and consumer tiers usually allow this by default. Enterprise tiers typically restrict it or disable it entirely. If you would be uncomfortable with a contractor at the vendor reading your conversation, you should not be using the consumer tier for that conversation.
The takeaway: the same model can be wrapped in very different product terms. The wrapping is what determines whether a particular use case is safe. Once you internalize that, the vendor-by-vendor differences become much easier to read.
Vendor by vendor: what the major AI tools do with your data
Vendor terms shift, sometimes more than once a year, so always verify current terms at the moment you sign or license. The shape below describes the pattern as of the most recent stable terms in each vendor’s documentation.
OpenAI: ChatGPT and the API
OpenAI publishes different terms across at least four products: ChatGPT Free, ChatGPT Plus (and the related personal tiers), ChatGPT Team, ChatGPT Enterprise, and the API.
- ChatGPT Free and Plus. Historically, conversations are used to improve OpenAI’s models by default. Users can opt out via a setting (Data Controls). Conversation history is retained on the account unless deleted, and even deleted conversations are typically retained for 30 days for abuse review. Free accounts may also have content reviewed by humans for safety classifier improvement.
- ChatGPT Team. Designed for small business and team use. By default, OpenAI does not train on Team customer content. Workspace administrators have visibility over usage. Conversations are retained according to admin settings.
- ChatGPT Enterprise. Designed for enterprise customers, with SOC 2 compliance, SAML SSO, encryption at rest, and a contractual commitment that customer content is not used for training. Retention is admin-controlled. Available with a BAA for HIPAA-regulated customers under appropriate enterprise licensing.
- OpenAI API. Customer prompts and outputs are not used to train models by default (this was changed in 2023 from the previous default). API customers can also request zero-data retention for eligible endpoints, which means OpenAI does not store the prompt or output at all beyond the moment of inference. BAA available for healthcare customers using compliant configurations.
The practical rule: if you handle anything sensitive, the line is between ChatGPT Plus and ChatGPT Team. Plus is a consumer product wearing a paid label. Team is the smallest tier where the data-protection terms shift in favor of the customer.
Microsoft: Copilot for Microsoft 365 vs free Copilot
Microsoft sells two distinct Copilot families that share a name and almost nothing else from a data-protection perspective.
- Copilot (free, consumer, formerly Bing Chat). Operates under consumer terms. Conversations may be retained and used to improve Microsoft services depending on the consumer Microsoft account settings. Not appropriate for any business data more sensitive than what you would post on social media.
- Copilot Pro. Consumer subscription tier. Slightly better controls than the free consumer Copilot, but still consumer terms. Not appropriate for regulated data.
- Copilot for Microsoft 365 (licensed). The product that integrates with Outlook, Word, Excel, Teams, and SharePoint. Operates under M365 enterprise data protection terms. Customer prompts, responses, and grounding data are not used to train foundation models. Data stays within the tenant’s compliance and security boundary. Available with a BAA under HIPAA-eligible M365 plans. This is the version a business should be licensing if Copilot is part of its workflow.
- Microsoft 365 Copilot Chat (free for licensed users, formerly Bing Chat Enterprise). A web chat interface available to organizations with eligible M365 licenses. Provides enterprise data protection on chat content. Does not have access to organizational data the way licensed Copilot does, but the chat content itself is treated under enterprise terms.
The single most important Copilot distinction: free Copilot is not Copilot for Microsoft 365. The names are similar, the products are different, and the data terms are different. Audit which version each staff member is actually using before assuming the licensed protections apply.
Google: Gemini consumer vs Gemini for Workspace
Google’s split is similar to Microsoft’s, with two products under the Gemini name.
- Gemini in a consumer Google account. Conversations are saved to your Google account for a configurable retention period. Conversations may be reviewed by human reviewers as part of Google’s quality and safety processes. Gemini Apps activity is on by default; users can adjust retention. Not appropriate for business confidential data.
- Gemini for Google Workspace. Operates under Workspace data protection terms. Customer content is not used to train Google’s models. Human reviewers do not have routine access. Available with eligible Workspace licensing. BAA available for HIPAA-regulated customers on appropriate Workspace plans.
- Gemini API on Google Cloud. Enterprise data protection through Google Cloud. Configurable retention.
If your staff use Gmail with a personal Google account and have toggled on Gemini, none of the Workspace protections apply – they are using the consumer product. This is one of the most common shadow AI configurations in SMBs that use Google Workspace.
Anthropic: Claude consumer vs Claude for Work and the API
Anthropic publishes the most explicit data-handling commitments of the major vendors, with similar consumer-vs-paid splits.
- Claude.ai consumer. Free, Pro, and Max consumer tiers. Anthropic’s terms on training have shifted – newer terms allow training on consumer conversations by default with opt-out controls. Conversations are retained per account settings. Not appropriate for sensitive business data without explicit configuration.
- Claude for Work (Team and Enterprise). Customer prompts and outputs are not used to train models. Conversations are workspace-scoped, with admin controls. Designed for business and enterprise use. Available with appropriate enterprise terms.
- Anthropic API. API customer prompts and outputs are not used to train models by default. Data retention is limited to operational needs (typically 30 days for safety review unless zero-data-retention is configured for eligible endpoints). BAA available for HIPAA-regulated customers under appropriate enterprise configurations.
GitHub Copilot and other code AI
Code AI tools deserve special mention because the data they handle – source code – is one of the most sensitive categories in many SMBs.
- GitHub Copilot Individual. Code snippets sent for completion. Telemetry settings configurable. Default behavior allows GitHub to use code suggestions and prompts to improve the service.
- GitHub Copilot Business. Designed for organizations. By default, GitHub does not retain or use customer code or prompts to train models. Customer code is processed and discarded for completion.
- GitHub Copilot Enterprise. Adds enterprise controls, SSO, audit logging, and additional administrative features. Same data protection as Business.
Same pattern as the rest: Individual is a consumer tier. Business is where the data terms shift to favor the customer.
AI features inside SaaS tools
Most SMBs have far more AI exposure through AI features baked into existing SaaS than through standalone AI tools. Zoom AI Companion, Otter.ai, Fireflies, Read.ai, Notion AI, Slack AI, HubSpot Breeze, Salesforce Einstein, Calendly’s AI scheduling, Gong, Chorus, Avoma – each one ingests meeting audio, transcripts, documents, customer interactions, or sales calls and routes them through an AI service. Each one has its own data terms. Many use third-party model providers (often OpenAI or Anthropic) under their own contracts with those providers.
The rule of thumb for embedded AI features: the data terms are whatever the SaaS vendor negotiated with their model provider, wrapped in whatever terms the SaaS vendor offers you. Read both. Ask explicitly whether your meeting recordings, transcripts, or customer data are used to train any model – your provider’s, their subprocessor’s, or anyone else’s. Get the answer in writing. Some vendors will say “no” cleanly. Others will say “it depends,” which is a “yes” with extra steps.
The difference between consumer and enterprise AI plans, in plain terms
The differences are not subtle. Worth stating them flatly:
| Property | Free / consumer tier | Team / Business tier | Enterprise / API tier |
|---|---|---|---|
| Training on inputs by default | Usually yes (opt-out available) | No | No |
| Human review of conversations | Routine for quality / safety | Restricted | Restricted or disabled |
| Data retention | Indefinite by default, per account | Admin-configurable | Admin-configurable, often shorter |
| Workspace / tenant scoping | None – individual account | Yes – workspace | Yes – tenant or organization |
| Admin visibility into usage | None | Yes | Yes – with audit logs |
| SSO / SAML | No | Sometimes | Yes |
| BAA available for HIPAA | No | Sometimes | Yes (verify per vendor and tier) |
| SOC 2 / ISO 27001 alignment | Inherited from vendor | Inherited from vendor | Yes – explicitly attested |
| Zero data retention option | No | No | Yes (vendor-dependent, eligible endpoints) |
| Contractual commitments on data | Consumer terms only | Better, vendor-specific | Yes – negotiable in some cases |
The columns are blunt and the lines between them are real. A staff member using ChatGPT Plus and a staff member using ChatGPT Team are not using the same product. The first is your shadow AI exposure. The second is your governed AI workflow.
How to classify your data before you set any AI policy
The data-classification step is what makes the rest of your AI governance actually work. Without it, every AI conversation in the business is a coin flip – safe or not safe, depending on what the employee happened to be working on that day. With it, the rule becomes simple: “this tier of data is allowed in this tier of tool. Period.”
The simplest workable framework is three tiers, mapped to AI-handling rules.
Tier 1: Public / General
Information that would be safe to publish externally – marketing copy already on the website, product documentation that is already public, generic process descriptions, draft material that is not customer-bound and not strategy. Examples: a blog post draft, a tagline brainstorm, a generic email template, a learn-this-topic question.
Allowed in: Any approved AI tool, including free consumer ones, subject to your prohibited-tools list. There is no business reason to restrict Tier 1 data from broad AI use, and restricting it makes staff route Tier 2 and Tier 3 data through unsanctioned channels because they have stopped reading the policy.
Tier 2: Internal
Information meant to stay within the business but not regulated and not contractually restricted. Examples: internal process documents, vendor communications that are not under NDA, internal-only meeting notes, draft material that has not been shared externally, financial data that is not regulated (the cleaning company’s revenue numbers, not a public company’s pre-release earnings).
Allowed in: AI tools with enterprise data protection terms (ChatGPT Team or Enterprise, Copilot for M365, Claude for Work, Gemini for Workspace, equivalent Business / Enterprise tiers of embedded SaaS AI). Not allowed in free consumer AI.
Tier 3: Sensitive / Regulated
Information that is regulated, contractually restricted, or would cause meaningful business or legal harm if it leaked. Examples: PHI, PII (including SSNs, driver’s license numbers, financial account numbers), payment card data, employee personnel data, salary information, client data under NDA, source code with embedded secrets, attorney work product, internal strategy documents, board materials, M&A material.
Allowed in: AI tools with enterprise data protection terms AND a contract that covers the specific data type. For PHI: a tool with a signed BAA. For PCI: a tool that is in your PCI scope or explicitly out of scope. For client data under NDA: a tool where the contract aligns with the NDA terms. Never in free consumer AI. Sometimes not in business tiers either, depending on the specific data type.
Some businesses prefer four tiers (Public / Internal / Confidential / Regulated). Four works. Five does not – staff cannot reliably remember more than four data tiers in their head, and a rule no one can remember is not a rule. Three is the floor and the most common choice for SMBs.
What categories of data should never go into an unmanaged AI tool
Independent of your tiering, there are categories of data that should not enter any AI tool that is not on your approved-and-contracted list. Stating them explicitly in your policy and your training matters more than stating them implicitly through the tiering framework, because the categories show up in specific employee tasks and the categories are what staff will recognize in the moment.
- Protected Health Information (PHI). Patient names linked to diagnoses or treatment, medical record numbers, insurance numbers tied to health data, anything in a HIPAA-covered transaction. Putting PHI into a non-BAA-covered AI tool is a HIPAA breach in itself, full stop, even if no one outside the vendor sees it. The breach is the unauthorized disclosure to the vendor. (More on this in AI and HIPAA: what healthcare businesses need to know once that article goes live.)
- Personally Identifiable Information (PII). Social Security numbers, driver’s license numbers, passport numbers, dates of birth combined with names, financial account numbers, biometric identifiers. State privacy laws (California, Colorado, Connecticut, Virginia, Utah, and others) plus GDPR for any EU residents create real obligations around where this data flows and who processes it. A consumer AI vendor is not a processor you have a contract with.
- Payment card data. PAN (the long number on the front), CVV, magnetic stripe content, full track data. PCI DSS scope follows this data wherever it goes. Pasting a card number into an AI tool puts the AI tool in PCI scope, which it almost certainly is not designed for.
- Regulated financial data. GLBA-covered customer financial information at financial institutions, regulated investment advice records under FINRA / SEC rules, tax preparer client records under IRS Section 7216. The regulation usually defines who can process the data on your behalf. Consumer AI vendors are not on the list.
- Source code with embedded secrets. API keys, database connection strings, OAuth client secrets, signing keys, internal hostnames and IP addresses, private network architecture details. Even if the code itself is not particularly sensitive, the secrets in it are. Engineers paste production code into consumer AI tools constantly for debugging help.
- Client data under NDA or confidentiality. Most client master service agreements have a confidentiality clause that prohibits sharing client information with third parties without consent. The consumer AI vendor is a third party. Pasting in a client communication, document, or technical detail is a contractual breach the moment it happens, regardless of whether anyone notices.
- Internal strategy. Pricing strategy, acquisition pipeline, layoff plans, financial forecasts, competitive intelligence, board materials. Once outside the business, you have no control over where it sits.
- Employee personnel data. Salaries, performance reviews, disciplinary records, candidate evaluations, immigration paperwork, medical accommodations, complaints. Multiple regulatory frameworks intersect here, and the employee did not consent to having their personnel record processed by a third-party AI vendor.
- Credentials. Passwords, MFA recovery codes, API tokens. Staff paste these in when troubleshooting and rarely realize they did. The credential is now in the vendor’s logs.
- Confidential legal material. Litigation strategy, attorney-client privileged communications, draft contracts under negotiation. Privilege can be waived by disclosure to third parties. The consumer AI vendor is a third party.
This list is not exhaustive, but it covers what shows up in real SMB shadow AI incidents. If a category of data does not appear here and your staff handles it, ask the question: “would I be comfortable if a contractor at the AI vendor read this conversation?” If the answer is no, the data does not belong in a consumer AI tool.
What enterprise tiers actually change
Paying for the enterprise tier is not magic. What it actually does, in concrete terms:
- Contractual commitment that customer content is not used for training. This is the most important change. It moves the data terms from “you accepted our consumer privacy policy” to “we signed a contract with you that says X.”
- Workspace or tenant isolation. Your data is segregated from other customers’ data, with admin controls over who in your organization can see what.
- Admin visibility. Logging, audit trails, usage reports. The kind of visibility you need when an auditor asks “who accessed AI tools last quarter and what for?”
- Identity integration. SSO, SAML, conditional access. Means that an AI tool is gated by the same identity controls as the rest of your stack, and is shut off when an employee leaves.
- Configurable retention. Including, for some vendors and some endpoints, a zero-data-retention option where the vendor does not store the prompt at all.
- BAA availability. For regulated industries. Without a BAA, PHI in the tool is a breach. With a BAA, the AI use case is potentially compliant – subject to the specific data and use case actually being in scope.
- Reduced or eliminated human review. Enterprise tiers typically restrict or disable routine human review of conversations.
- Security attestations. SOC 2 Type II, ISO 27001, sometimes FedRAMP. These are what your insurer, your auditor, and your enterprise clients will ask for.
What enterprise tiers do not change:
- They do not stop staff from pasting Tier 3 data into the wrong tool. That is policy, training, and technical controls work.
- They do not make AI hallucination go away. The model still confidently invents things.
- They do not make output review unnecessary. AI-generated material still needs human review before it goes to a client.
- They do not give you full deletion guarantees in every case. Logs may still be retained briefly for safety review even under “zero retention” terms.
- They do not exempt you from notifying clients, regulators, or insurers about AI use where contractual or regulatory obligations require disclosure.
How to actually decide which AI tier to license
For most SMBs, the decision is not “do we license enterprise AI?” – it is “which staff need which tier, and what is the cheapest way to get the protections we need?”
The pragmatic path:
Step 1: identify who is using AI for work. Survey honestly, with amnesty for past use, and assume the real number is 1.5x what staff self-report. The shadow AI baseline rate is between 60% and 75% of knowledge workers.
Step 2: identify what data they are touching. Map roles to data tiers. Marketing coordinator pastes Tier 1 and occasional Tier 2 data. Bookkeeper handles Tier 2 and Tier 3. HR handles Tier 3 across many categories. Engineering handles Tier 2 and Tier 3 source code. Sales handles Tier 2 customer data and occasional Tier 3 contract material. Executive handles Tier 3 across most categories.
Step 3: map data tiers to tools and tiers. Tier 1 staff may not need a paid AI license at all – free consumer AI is usable for Tier 1 data, subject to your prohibited-tools list. Tier 2 and Tier 3 staff need licensed AI on the right tier with the right contract. Many SMBs settle on Copilot for M365 for staff already in the M365 ecosystem (because the licensing piggybacks on what is already in place) plus ChatGPT Team or Claude for Work for staff who need broader AI capabilities.
Step 4: budget realistically. Copilot for M365 currently lists around $30 per user per month for the licensed Copilot product (separate from base M365 licensing). ChatGPT Team is around $25-$30 per user per month. Claude for Work is similarly priced. GitHub Copilot Business is around $19 per user per month. The math for a 20-person SMB where 10 staff need licensed AI is in the $200-$400 per month range – meaningful but not catastrophic. Compare against the cost of an audit finding, an insurer renewal denial, or a client questionnaire that you cannot answer.
Step 5: write it down. The decision belongs in your AI acceptable use policy and in your approved-tools list. “We use Copilot for M365 for general office work, ChatGPT Team for marketing and analyst work, and GitHub Copilot Business for engineering. Free consumer AI tools are approved for Tier 1 (public) data only.”
Step 6: enforce technically where you can. For staff on managed devices, you can block consumer AI URLs at the DNS or web-filter layer, push the approved Copilot or ChatGPT extensions, and prevent installation of competing AI tools. For unmanaged devices and personal-account use, you cannot enforce technically – you rely on policy, training, and disclosure obligations. This is fine. Policy plus training plus a credible enforcement story handles most of the risk.
Worked example: a 25-person professional services firm
A short illustration of how the framework applies in practice.
A 25-person accounting practice has been on Microsoft 365 Business Standard for three years. They have an external IT provider but no internal IT staff. The owner becomes aware that staff are using ChatGPT (free) to draft client communications, summarize tax notices, and check tax-code interpretations. The bookkeeper has pasted client P&Ls into ChatGPT free to “ask what looks unusual.” The receptionist has used Gemini consumer to draft client onboarding emails. The owner herself has used ChatGPT to draft sensitive HR communications.
Applying the framework:
- Data tier audit. Most work involves Tier 2 (internal practice operations) and Tier 3 (client financial data, regulated under various tax-preparer rules and contractual confidentiality). PHI is not in scope. PCI is not in scope. State PII is in scope. Client confidentiality is in scope on every client engagement letter.
- Tool decisions. Upgrade existing M365 Business Standard to Business Premium where staff need Copilot, plus license Copilot for M365 ($30 per user per month) for the 12 staff who handle client documents directly. Add ChatGPT Team for the four staff who do broader research and analysis work where the Copilot integration is less helpful. Reception and admin staff stay on no licensed AI but are told they may use free consumer AI for Tier 1 (public marketing copy, generic templates) only.
- Prohibited tools. Free ChatGPT, free Gemini, free Copilot, free Claude.ai for any Tier 2 or Tier 3 work. Browser AI extensions that read page content. Personal-account AI on work devices.
- Approved data flow. Client financial data, draft tax communications, internal HR documents – in Copilot for M365 only. Generic drafting, public-domain research – free consumer AI is fine on Tier 1.
- Policy. AI acceptable use policy written, signed by all staff at the next staff meeting. Owner emails the firm’s main institutional client to disclose the AI tools in use and confirm they meet the confidentiality terms of the engagement.
- Technical controls. DNS-layer block of free consumer ChatGPT, Gemini, and Claude on managed devices. Copilot for M365 deployed and configured. ChatGPT Team workspace provisioned with SSO.
- Total cost. Around $400 per month in incremental licensing. Half a day of admin work. Two-hour staff meeting plus 30-minute training.
The before-state and after-state are completely different from a risk perspective, at a cost that is small compared to one client losing trust or one audit finding.
How this fits with your cybersecurity policy and broader IT governance
Your AI data-handling rules do not exist on their own – they live inside the cybersecurity and IT policy framework you already have (or should have). The structural relationships:
- The cybersecurity policy sets the data classification framework that the AI policy uses. If your cybersecurity policy already has three or four data tiers defined, reuse them; do not invent a parallel AI-specific classification.
- The AI acceptable use policy is where the rules from this article get codified for staff and signed at onboarding.
- The IT policy for remote workers covers software installation and personal-device rules that intersect with shadow AI (browser AI extensions, personal-account AI on work laptops).
- The BYOD policy covers what staff can do with company data on personal devices, which is where most personal-account AI use happens.
- The M365 security hardening checklist covers the tenant configuration prerequisites that need to be in place before Copilot for M365 is enabled.
- The cyber insurance for small business overview matters because more insurers are now asking about AI use in renewal questionnaires; documented data handling rules are part of what they want to see.
If you are HIPAA-covered, the HIPAA cybersecurity requirements article covers the baseline that any AI use has to sit on top of. The AI-specific HIPAA detail (which vendors have signed BAAs, what counts as a Business Associate relationship for AI vendors, how to document AI use for audit purposes) is a separate article in this series, coming next.
What this looks like to auditors, insurers, clients, and regulators
How outside parties will read your AI data-handling posture is worth thinking about explicitly, because the question shows up in different forms from each of them.
- Auditors (SOC 2, ISO, internal IT audit). Want to see a documented data classification, a documented approved-tools list, evidence that the policy is communicated, evidence that staff have signed it, and evidence that violations are tracked. The AI piece does not need its own separate audit program for most SMB audits – it can sit inside the existing IT general controls work, as long as it is documented.
- Insurers (cyber insurance renewals). Are increasingly asking explicit questions: “do you have an AI acceptable use policy?”, “what AI tools are approved?”, “how do you prevent staff from using AI on regulated data?”. The right answer is short, written, and signed. The wrong answer is “we tell people to be careful.”
- Clients (security questionnaires for B2B engagements). Are starting to ask “do you use AI to process our data?” and “what AI tools do you use?”. The truthful, defensible answer requires you to know – which is the data classification and approved-tools list at work.
- Regulators (sector-specific). For HIPAA-covered, finance-regulated, education-regulated, or government-contractor businesses, the regulatory framework already governs where data flows. AI is just another flow. The question regulators will ask is no different from how they ask about any other data processor.
The pattern: outside parties do not expect you to have eliminated AI risk. They expect you to have documented data handling, named approved tools, and a way to enforce the rules. Documented is the keyword.
10 common mistakes in AI data handling
The patterns that show up repeatedly when SMBs hit AI data exposure incidents:
- Assuming “ChatGPT” means one product. ChatGPT Free, Plus, Team, Enterprise, and the API are different products under different data terms. Approving “ChatGPT” without naming the tier is an unenforceable rule.
- Confusing free Copilot with Copilot for Microsoft 365. The names are nearly identical. The data protections are not. Audit which staff are using which.
- Skipping data classification. Without a tiering rule, every AI use case is decided ad hoc by the employee in the moment. That is the shadow AI baseline.
- Banning AI broadly without offering an approved alternative. Staff continue using AI; they just stop telling anyone. Shadow AI metric goes from bad to worse because now even the willing staff hide it.
- Approving consumer AI for “low-risk” work without defining what low-risk means. Tier 1 needs an explicit definition. Otherwise “low-risk” expands to whatever the employee thinks is low-risk in the moment.
- Ignoring AI features inside approved SaaS. Slack AI, Zoom AI Companion, Notion AI, HubSpot Breeze, Salesforce Einstein. Each one ingests data and routes it through an AI service under terms the SMB never reviewed. They are often the largest shadow AI exposure in the business.
- Forgetting personal-account AI on work devices. The employee is signed into their personal ChatGPT account on their work laptop. Everything pasted lives in their personal ChatGPT history, outside the business’s controls, forever.
- Treating browser AI extensions as harmless. A “summarize this page” extension is reading the contents of every page the user views – including CRM, payroll dashboards, M365 admin center – and sending them to a third-party AI service.
- Not training on the gray areas. Staff understand “do not paste client SSNs into ChatGPT.” Staff do not know whether they can use Copilot to summarize a meeting where a client said something off the record. The gray cases are where the training time is needed.
- Failing to disclose AI use where contracts require it. Many client master service agreements and confidentiality terms either require disclosure of subprocessors or prohibit sharing data with third parties without consent. AI use frequently triggers one or both. The disclosure obligation is real and is rarely thought about.
Time to set this up
A workable AI data-handling setup for a typical 10-50 person SMB:
| Phase | What happens | Time |
|---|---|---|
| Data classification | Define three tiers, name examples per tier, write the rule | 2-4 hours |
| Vendor research | Confirm current data terms for the tools you are considering | 4-8 hours |
| Tool selection and licensing | Decide which tiers of which tools, secure licensing, set up workspaces | 1-2 weeks (including procurement and SSO setup) |
| Policy writing | Write the AI acceptable use policy including the data-tier-to-tool mapping | 2-4 hours for the working draft |
| Technical controls | DNS or web filter to block unapproved consumer AI on managed devices, deploy approved extensions | 1-2 days |
| Staff communication | Single staff meeting + manager follow-ups, signed acknowledgment | 2 hours of meetings, 1 week of follow-up |
| Documentation for auditors / insurers / clients | Save the policy, the approved-tools list, the staff sign-off log, the vendor BAA or data agreement copies | Ongoing |
| Total elapsed time | From “we should do this” to “we have done this” | 2-4 weeks |
The four-week version is the realistic version for a small business. The two-week version is the one where someone has already drafted the policy and just needs sign-off and licensing.
What is next in this content series
This article covered the data side – what each major AI tool does with your inputs, why the consumer-vs-enterprise distinction matters, and how to classify your data so the policy work actually has something to hang on. The follow-ups go deeper into specific cases:
- AI and HIPAA – what healthcare businesses need to know, including which AI vendors have signed BAAs, what a Business Associate relationship means for AI vendors, and how to document AI use for audit purposes
- AI security risks – prompt injection, data leakage, AI-generated phishing, deepfakes, voice cloning
- Microsoft Copilot for small business specifically, including the M365 permission hygiene that has to happen before rollout
- How to build a lightweight AI governance framework that does not require enterprise-scale process
- How to evaluate any AI tool for business use – the questions to ask, the contract terms to look for, the data flows to map
If you have not read them yet, the shadow AI wake-up call and the AI acceptable use policy template are the upstream pieces. Read them before this one if you are starting from scratch, or after this one if you came in through this article.
If your AI work is happening inside a broader managed cybersecurity engagement, the managed cybersecurity services for small business overview is the parent context.
How Sequentur can help
If you want help building the data classification, choosing the right AI licensing tier across your team, configuring Copilot for Microsoft 365 securely, or just a second pair of eyes on what you have already drafted, schedule a call.
Get the Best IT Support
Schedule a 15-minute call to see if we’re the right partner for your success.
Testimonials
What Our Clients Say
Here is why you are going to love working with Sequentur