Sequentur Blog
Helping you stay ahead of IT challenges
Real-world IT knowledge from engineers solving problems every day.
Practical IT knowledge for businesses that can’t afford downtime
Network monitoring for small business: what to watch and how
Most small business outages get discovered by users, not by tools. The first sign that the firewall is dropping packets is an angry message from sales. The first sign that the internet circuit has degraded to 30 percent of its rated speed is a manager asking why video calls keep freezing. The first sign that a switch is overheating is the conference room going dark. In every one of those cases, the problem was visible in monitoring data thirty minutes to thirty hours before a human noticed – if monitoring had been running.
Network monitoring is the difference between learning about problems from your team and learning about problems from your tools. It does not prevent every incident, but it converts a meaningful percentage of “the network is down” emergencies into “the network is degrading and we have an hour to fix it before anyone notices” tickets. That gap is the difference between a Tuesday morning that ends on time and one that does not.
This article walks through what to actually monitor at SMB scale, what alerts matter versus what becomes noise, the free-versus-commercial tooling landscape, how RMM platforms handle the same job for MSP clients, and the realistic answer for who should run the monitoring stack. Who it is for: an internal IT generalist deciding whether to stand up monitoring themselves, a business owner trying to understand what an MSP is selling them under the “monitoring” line item, or a technical decision-maker scoping a network refresh. Monitoring is one component of the broader operational engagement covered in managed network services for small business.
Short answer
A working SMB network monitoring setup watches five things: uptime of critical devices (firewall, core switch, key servers, internet circuit), bandwidth utilization on the WAN and key internal links, device health (CPU, memory, temperature, port errors), latency to a handful of external targets, and security-relevant events (failed logins, configuration changes, firmware updates). It runs 24×7, alerts on thresholds tuned to avoid noise, and routes alerts to someone who actually responds within the agreed window. At SMB scale this is almost always delivered through an MSP’s RMM platform rather than a self-hosted stack – it is technically possible to do it yourself, but it is rarely the right use of internal IT capacity.
Network monitoring at a glance
| Layer | What to watch | Why it matters | Alert threshold |
|---|---|---|---|
| Uptime | Ping/availability of firewall, core switch, key servers, WAN | Outages need detection, not discovery | Down over 2 min |
| Bandwidth | WAN utilization, top talkers, inter-VLAN throughput | Saturation kills calls and uploads | Sustained over 80% for 5 min |
| Device health | CPU, memory, temp, port errors, fan status | Aging hardware fails predictably | CPU over 80% sustained, temp over spec |
| Latency | Round-trip to firewall, ISP gateway, cloud apps, M365 | Latency drift is the symptom of half the user complaints | Over 100ms or 2x baseline |
| Wireless | AP up/down, channel utilization, client counts | One bad AP kills a floor | AP down over 5 min, channel over 70% |
| Security events | Failed admin logins, firmware/config changes, IDS/IPS hits | Often the first detectable sign of compromise | Tuned per device |
| Logs | Syslog from firewall and switches into a central store | Investigation needs history attackers cannot delete | N/A (alerting on patterns) |
Coverage starts at the top of the table and gets harder going down. Uptime monitoring is easy and high value. Wireless and security event monitoring need more setup and tuning. Log centralization is the foundation everything else sits on but the one most often skipped.
What network monitoring actually covers
The term “network monitoring” gets used loosely. At SMB scale it covers five distinct functions that often run on the same tool but answer different questions.
Uptime monitoring answers “is the device responding right now.” It uses ICMP ping, SNMP availability checks, or HTTP/HTTPS probes against each critical device on a one-to-five-minute interval. Cheap, simple, and the foundation of every monitoring system. The output is binary – up or down – plus historical availability percentages over time. If your monitoring does nothing else, it should at least do this.
Bandwidth and traffic monitoring answers “what is using the network and how much.” It pulls byte counters from switch ports and firewall interfaces via SNMP, or runs flow-based monitoring (NetFlow, sFlow, IPFIX) on the firewall to see source/destination/port/protocol breakdowns. The output is graphs of utilization over time and a “top talkers” list of which devices or users are consuming the most. This is what answers “why was the network slow at 2pm Tuesday.”
Device health monitoring answers “is the equipment going to fail soon.” It tracks CPU, memory, temperature, fan speed, power supply status, and per-port error counters on the firewall, switches, and access points. Aging network equipment usually shows degradation in these numbers weeks before it actually fails – rising port error counts on a switch, a fan that has started reporting variable RPM, a firewall whose memory utilization has been creeping up for a month. Catching these patterns is the difference between scheduled replacement and emergency replacement.
Latency and performance monitoring answers “does the network feel fast or slow from the user’s perspective.” It pings a handful of targets (the firewall, the ISP gateway, M365 endpoints, the cloud apps the business actually uses) and graphs round-trip times. Most user complaints about “slow internet” turn out to be latency drift on a specific path rather than bandwidth saturation. Latency graphs answer this within seconds.
Security event monitoring answers “is something unusual happening that I should care about.” It pulls log events from the firewall, switches, and APs – failed admin logins, firmware changes, configuration changes, IDS/IPS hits, blocked outbound connections to known-bad destinations. This overlaps with proper SIEM and is usually shallower in a monitoring tool, but the basics – “someone is brute-forcing the firewall admin page from a foreign IP” – belong in the alerting layer of any competent SMB setup.
These five functions are what a serious answer to “we have monitoring” should include. If a vendor or an MSP says they monitor the network, ask which of these five they cover, on what cadence, with what alert routing.
What to actually watch
The category list above is what to watch in general. The specific items below are what to set up first if you are building monitoring from scratch.
Critical infrastructure uptime. Ping the firewall WAN interface, the firewall LAN interface, the core switch management IP, each access point management IP, and the management IP of any key server or NAS. Alert if any of these are down for more than two minutes.
Internet circuit health. Either a direct interface check on the firewall WAN, or an external uptime monitor that probes a public host from outside. Both are valuable – the firewall view tells you if the link is up, the external view tells you if traffic actually reaches the internet. ISPs and circuits have failure modes where the link is up but traffic is not flowing.
WAN bandwidth utilization. Graph the WAN interface in/out bytes on the firewall over time. Alert on sustained 80 percent or higher for more than five minutes. Spikes are normal; sustained saturation is not. Track “top talkers” so you can answer who or what is using the bandwidth when saturation happens. This is the first place to look when users complain about a slow network.
Switch port errors. Pull error counters from every managed switch port. Rising CRC errors on a port is almost always a bad cable or a failing port. Alert on any port that crosses a threshold (a non-zero error count is fine; an error count that grows by hundreds per hour is not). The depth on why this matters is in managed switches for small business: what they are and when you need one.
Access point health. Each AP should report up/down state, current client count, channel utilization, and signal-to-noise ratio. A 70-percent channel utilization is a clear sign of WiFi congestion before users complain. AP down events are critical alerts – one AP failing means one area of the office loses coverage. The wireless side of the platform decision is covered in business WiFi vs consumer WiFi: why it matters for your office – consumer routers do not produce meaningful monitoring data.
External latency targets. Ping 8.8.8.8 or 1.1.1.1, the ISP gateway, the M365 region endpoint, and any business-critical SaaS app. Graph round-trip time. Alert on sustained latency over 100ms or 2x baseline. Latency drift is what most “video calls keep freezing” reports turn out to be.
Firewall security events. Failed admin logins (especially from external IPs), configuration changes (with the user that made them), firmware update events, IDS/IPS high-severity hits, geo-blocked traffic spikes. Most modern firewalls can syslog these events to a central collector or directly to an alerting system.
DHCP scope utilization. If a VLAN runs out of DHCP addresses, new devices stop joining and existing devices keep working. The symptom looks like “new laptops can’t connect.” Alert when any DHCP scope crosses 75 percent utilization so the scope can be expanded before the failure.
Backup job status. Strictly speaking this is backup monitoring, not network monitoring, but most monitoring stacks include it and the question “did backups run successfully last night” is the single most important morning check. Alert on any failed or skipped backup job.
This list does not say “everything.” It says “everything that matters at SMB scale, where attention is the constraint.” Adding more monitored items beyond this list is fine as long as the alerts stay tuned and actionable.
What alerts matter versus what becomes noise
The single biggest failure mode of SMB monitoring is alert fatigue. A monitoring system that generates 200 alerts a day teaches everyone to ignore alerts; the one that matters is buried in the 199 that did not. Tuning the alert layer is more important than tuning the monitoring layer.
The rule that works in practice: every alert should be actionable, and the action should be obvious. If an alert fires and the responder’s reaction is “I do not know what this means” or “I am not going to do anything about this,” the alert is misconfigured.
Concrete examples of alerts that earn their keep:
- Critical infrastructure down for more than two minutes (action: investigate immediately)
- WAN saturation sustained for more than five minutes (action: identify top talker, decide if intervention is needed)
- Switch port errors rising rapidly (action: check the cable, plan port move or replacement)
- Firewall failed admin logins exceeding a threshold from an external IP (action: confirm geo-block is in place, change admin credentials if needed)
- Backup job failed last night (action: investigate the failure, re-run the job)
- AP down for more than five minutes (action: check power and uplink, replace if dead)
- DHCP scope above 90 percent utilization (action: expand the scope before it fills)
Concrete examples of alerts that become noise and should be silenced or aggregated:
- Single ping miss on a healthy device (network ping packets are lost routinely; alert on patterns, not individual events)
- Every routine config change (review them in a daily digest, do not page the on-call for each one)
- Low-severity IDS/IPS hits (the internet is constantly being scanned; medium-and-above is the threshold)
- Bandwidth spikes lasting less than a minute (real saturation is sustained, not bursty)
- A failed login from an authenticated user mistyping their password (alert on patterns, not individual events)
- Every firmware availability notification (digest weekly; do not page)
- Every device reboot in a planned maintenance window (suppress alerts during the window)
The pattern: page on conditions that need a human response in the next hour. Digest the rest. The metric of a healthy monitoring system is “alerts per week the on-call had to act on” – if that number is in single digits, the system is well-tuned. If it is in the hundreds, the system has been training the on-call to ignore it.
A useful additional discipline: every alert that gets ignored gets reviewed. If the on-call sees an alert and decides not to act, that decision is data. Either the alert was misconfigured (silence or aggregate it) or there is a real condition that needs investigation. A monthly review of “alerts the on-call did not act on” is how the system stays tuned over time.
Free versus commercial monitoring tools
The tool landscape at SMB scale falls into four buckets.
Free and open-source. LibreNMS, Zabbix, Nagios Core, Cacti, Grafana with Prometheus, Uptime Kuma. All capable, all require Linux administration skill to stand up and maintain. The license is free; the operational cost is real. A free monitoring stack that takes 10 hours a month to maintain is not cheaper than a $200/month commercial tool if the engineer’s time costs more than $20/hour. Realistic when the business has a competent Linux generalist who genuinely enjoys this kind of work and has the time. Unrealistic when the IT generalist has 12 other priorities.
Commercial all-in-one network monitoring. PRTG (Paessler), SolarWinds NPM, ManageEngine OpManager, Auvik, Domotz. Designed for IT teams and MSPs. Auvik and Domotz target SMBs and MSPs specifically and are easy to deploy; PRTG and OpManager are more configurable but require more setup. Pricing typically runs $30-$200 per month for SMB-scale deployments depending on device count.
Cloud-native uptime monitoring. Uptime Robot, Pingdom, StatusCake, Better Stack. These watch availability and basic latency from outside your network, alerting if your public-facing services or specific external probes fail. Free tiers exist (Uptime Robot’s free tier covers most small business needs). Useful as a complement to internal monitoring – they answer “can the outside world reach my services” while internal monitoring answers “are my internal devices healthy.”
RMM platforms with network monitoring built in. ConnectWise Automate, NinjaOne, Datto RMM, Atera, N-able N-central, Syncro. These are designed for MSP delivery but some sell direct to SMBs. The network monitoring layer in an RMM is usually competent for SMB needs – uptime, basic SNMP, device health on managed endpoints – and lives alongside the endpoint management, patching, and remote access features. The depth on this category is in what is remote monitoring and management (RMM) and why MSPs use it.
A realistic SMB picture: either pay for a commercial all-in-one tool and run it internally (rare, expensive in time), use an RMM (which means engaging an MSP), or pair a free internal tool with a free external uptime monitor for a minimum viable stack. The “free tool that nobody maintains” trap is the worst outcome – monitoring data that nobody looks at is functionally identical to no monitoring at all.
| Tool category | Typical cost | Setup effort | Maintenance | Best for |
|---|---|---|---|---|
| LibreNMS / Zabbix / Grafana stack | Free + server | 1-2 weeks | 4-10 hrs/month | Linux-confident in-house IT |
| PRTG / OpManager / SolarWinds | $30-$200/mo | 3-7 days | 1-3 hrs/month | Mid-market with dedicated IT |
| Auvik / Domotz | $50-$300/mo | 1-2 days | Under 1 hr/month | SMBs and MSPs |
| Uptime Robot / Pingdom (cloud) | Free to $30/mo | Under 1 hr | None | External-only availability |
| RMM (NinjaOne / Datto / Atera) | $5-$15/endpoint/mo | Done by MSP | None internal | MSP-managed SMBs |
How RMM platforms handle network monitoring for MSP clients
When an SMB engages an MSP under a managed services agreement, network monitoring is almost always part of the scope. The MSP runs the monitoring stack centrally – usually an RMM platform with bolt-on network monitoring, sometimes a dedicated network monitoring tool like Auvik or Domotz alongside the RMM. From the SMB’s perspective, monitoring is “in the contract”; from the MSP’s perspective, it is one of the operational backbones of how they deliver service.
What this looks like in practice:
Deployment. The MSP places a small monitoring probe on the SMB’s network – either a virtual appliance on a workstation, a dedicated mini PC, or a software agent on a server. The probe discovers devices, polls SNMP, and reports back to the MSP’s cloud-hosted monitoring console.
Coverage. Endpoints (laptops, desktops, servers) are monitored through the RMM agent. Network devices (firewall, switches, APs) are monitored through SNMP via the probe. Internet circuits are monitored from both sides – inside the network from the firewall, and outside the network from an external probe. The full coverage area is typically defined in the MSA.
Alerting. Alerts fire into the MSP’s ticketing system. Critical alerts page the on-call engineer; lower-priority alerts queue for the next business day. From the SMB’s perspective, this means most network problems get a ticket opened before anyone at the SMB notices, and many get resolved before users see them.
Reporting. The MSP produces periodic reports – usually monthly – showing uptime, alert volume, top talkers, capacity trends, and any recurring patterns. Quarterly business reviews are where capacity decisions get made (do we need more bandwidth, do we need to replace a switch, is the WiFi at saturation).
Limitations. RMM network monitoring is competent for the SMB use case but is not as deep as dedicated network monitoring tools. Application-layer monitoring (is the database slow, is the login server responding), synthetic monitoring (running scripted user flows against critical apps), and distributed tracing are outside the typical RMM scope. SMBs that need those capabilities either pay extra for a layered tool or accept the gap.
The MSP-delivered model is the realistic answer for most SMBs. The economics of building and maintaining a competent monitoring stack internally rarely make sense below 200 employees and an IT team of three or more. The MSP amortizes the monitoring platform cost across all their clients and runs it as their core operation, which is what makes the price work.
Proactive monitoring versus reactive troubleshooting
The point of monitoring is to convert outages into degradations and degradations into scheduled maintenance.
The reactive path looks like this: a user reports the network is slow, IT investigates, IT tries a few things, IT escalates to the ISP, the ISP runs tests, the ISP says the circuit is fine, IT looks again, IT finds a switch port with a failing SFP module, IT replaces it, three hours have passed. Three more users have complained. The CEO is frustrated. This happens once or twice a month in environments without monitoring.
The proactive path looks like this: monitoring shows port error counts rising on switch 1, port 18, over the past week. The MSP receives a low-priority alert. The MSP schedules a maintenance window for Friday evening and replaces the SFP module. Nobody notices.
The difference is not the technical work – it is the same SFP module, the same five-minute replacement procedure. The difference is whether the work happens on Tuesday morning during the user complaint or on Friday evening during planned maintenance.
Three categories of incident that monitoring catches before users notice:
Slow degradations. Port error counts rising, memory utilization creeping up, firmware getting further behind. These are detectable weeks in advance and easy to schedule around.
Sudden but localized failures. An AP goes down, a switch port loses link, a backup job fails. Users do not necessarily notice (one AP failing is degraded coverage, not no coverage), and monitoring catches it before the next person walks into the dead zone.
Bandwidth and latency issues. A new cloud app gets rolled out and starts consuming 60 percent of WAN bandwidth. Monitoring sees it immediately; users feel it as “the network is suddenly slow.” Catching this in the bandwidth graphs lets IT have a conversation about either bandwidth upgrade or app prioritization before it becomes a complaint.
Things monitoring does not catch:
Hard, sudden failures. A switch dies, the internet circuit fails, a power supply blows. Monitoring will tell you immediately that it happened, but it will not have warned you in advance. The defense for these is redundancy, not monitoring – see how to set up redundant internet for your business for the WAN side.
Application bugs that look like network problems. A SaaS app having a regional outage, a misconfigured DNS record, an expired certificate. Monitoring may see latency or error rate change but the root cause is not in the network. Diagnostic experience matters more than monitoring data here.
Zero-day attacks before signatures exist. Monitoring sees the symptoms (unusual outbound connections, spike in failed logins) but not the attack itself. Combining monitoring with DNS filtering, endpoint detection and response, and the network security checklist is what closes the gap.
A realistic monitoring stack converts roughly 50-70 percent of incident-class events into degradation-class events at SMB scale. That is the ROI conversation. The remaining 30-50 percent still happen; they happen faster to resolve because monitoring gives the responder context the moment the incident starts.
Log retention and central log collection
Monitoring data is real-time. Log retention is historical. Both matter, and the historical side is the one most often skipped.
Every firewall, switch, and AP generates log events – connection attempts, configuration changes, authentication events, errors. By default these logs live on the device itself, in a circular buffer that overwrites itself every few hours to few days depending on traffic. When an incident happens and someone asks “what was the firewall doing at 3am on Tuesday last week,” the answer is “those logs are gone.” The fix is centralized log collection.
The minimum viable setup at SMB scale: a syslog server (Linux VM or commercial appliance) that all network devices send their logs to, with at least 90 days of retention. Better setups run a small SIEM-lite tool (Graylog, Wazuh, Splunk Free) that indexes the logs and provides search. The full enterprise version is a SIEM with correlation rules and SOC-led monitoring – usually outside SMB budget unless compliance requires it.
What to do with the logs once you have them:
- Investigate incidents (the on-demand use case)
- Generate alerts on patterns (failed logins, configuration changes, blocked outbound to known-bad destinations)
- Satisfy audit and compliance requirements (some frameworks specifically require log retention; see the network security checklist item 15)
- Provide forensic evidence after a breach
Attackers commonly delete local logs to cover their tracks. Centralized logging on a separate server that the attacker does not have access to is what preserves the evidence. This is why item 15 of the network security checklist – “log retention with offline copy” – is non-negotiable for any SMB doing meaningful security work.
The realistic SMB answer is the same as for monitoring itself: the MSP runs centralized logging as part of the managed service. SIEM-class capabilities are usually a separately-priced add-on; basic syslog retention is typically included. Confirm which one is in the contract.
Common network monitoring mistakes
These are the patterns that show up in audits, in failed monitoring projects, and in incident retrospectives.
- Monitoring deployed but alerts go nowhere. A monitoring system that sends emails to an unmonitored mailbox is not monitoring. The first thing to verify in any deployment is the alert routing.
- Alert fatigue from over-paging. When everything is critical, nothing is. Tune aggressively; aggregate noisy alerts into digests.
- Monitoring the easy stuff and skipping the hard stuff. Uptime checks on the firewall are easy. Port error counts on every switch, channel utilization on every AP, syslog forwarding on every device – these are the items most often skipped and most often where the real signal lives.
- No on-call rotation. Alerts that fire on Saturday night go unanswered until Monday morning. If the business does not need 24×7 response, fine, but the alert thresholds should reflect that. “Critical” alerts that nobody is going to look at for 36 hours are not actually critical.
- No baseline. No way to know what “normal” looks like. Latency monitoring without a baseline graph just produces numbers. Latency monitoring with two weeks of baseline graphs produces “this is 3x normal” – which is actionable.
- Monitoring the WAN without external uptime probes. The firewall says the WAN is up. External users say the site is down. Both can be true – the firewall is up, the ISP backbone or DNS is broken. External probes catch this.
- No log centralization. Real incidents need history. Local-only logs get overwritten or deleted; centralized logs are the only reliable evidence trail.
- Buying tools and not staffing them. A $200/month monitoring tool with no one looking at it is worse than no tool, because the budget line creates the illusion of coverage.
- Monitoring scope creep without alert review. Every new device added to monitoring should have its alert thresholds reviewed, otherwise the noise grows linearly with the device count and the signal degrades.
- Treating monitoring as set-and-forget. Network behavior changes – new applications, new traffic patterns, new failure modes. A monitoring setup that has not been tuned in 18 months is monitoring last year’s network. Quarterly review of thresholds and alert volumes is the floor.
How long the project actually takes
| Scope | Time | Notes |
|---|---|---|
| External uptime monitoring only (Uptime Robot) | 1-2 hrs | Free, instant, covers basic public-facing availability |
| RMM-delivered monitoring through MSP | Days to a week | MSP does the work; internal time mostly approval and access |
| Commercial tool deployed internally (Auvik, Domotz) | 1-2 days | Easy to deploy, harder to tune; budget another week for alert tuning |
| Commercial tool deployed internally (PRTG, OpManager) | 3-7 days | More configuration; SMB usually picks a simpler tool |
| Open-source stack (LibreNMS or Zabbix + Grafana) | 1-2 weeks | Setup is the small part; tuning and maintenance is the recurring cost |
| Centralized logging on top of monitoring | +2-5 days | Often deferred; do not skip it |
| Quarterly tuning cycle (recurring) | 2-4 hrs / quarter | Non-negotiable to keep alert quality high |
The shortest path to “actually monitored” for an SMB without an MSP is: external uptime monitor (1 hour, free), Auvik or Domotz internally (1-2 days, $50-$300/month), and a recurring quarterly review (2-4 hours). The longest path is “stand up an open-source stack ourselves” and that one tends to stall in the maintenance phase.
When to involve an MSP
Network monitoring is one of the clearest cases where MSP delivery wins on economics. Three reasons.
The first is platform cost. A commercial monitoring stack with proper coverage costs an SMB $100-$500 per month direct, plus internal time to run it. An MSP spreads the same platform across many clients and includes monitoring in the per-user managed services fee with no separate line item. The cost-per-monitored-device is dramatically lower under the MSP model.
The second is on-call. Network monitoring without someone responding to alerts is decorative. Most SMBs cannot staff a 24×7 on-call rotation for one or two people without burning them out. The MSP runs an on-call rotation across many clients, so the per-client load is sustainable.
The third is tuning. Keeping alert thresholds well-tuned over time is a recurring chore. An MSP that runs the same monitoring platform across 50 clients gets very good at tuning by sheer volume; an internal IT generalist tuning one deployment is starting from scratch every time. The institutional knowledge advantage is real.
The case for internal monitoring is narrower than people expect. It makes sense for SMBs with a dedicated IT team of three or more, where one person is genuinely interested in running the monitoring stack and has the capacity. It also makes sense for environments with extremely sensitive data where centralizing monitoring through an MSP probe is a compliance concern. Outside those cases, MSP delivery is the realistic answer.
A network assessment is the right starting point for any SMB unsure whether their current monitoring is real or decorative. The assessment produces a documented baseline of what is being monitored, what is not, and what the gaps cost.
How Sequentur fits in
Sequentur is a security-first MSP / MSSP for small and mid-sized businesses across the 15-to-250-employee range, including both general SMBs and regulated industries like healthcare, legal, financial services, and defense contractors. If you want help standing up network monitoring that actually generates actionable alerts, evaluating whether your current monitoring is real coverage or a line item, or handing off monitoring entirely so your internal IT can focus on higher-leverage work, schedule a call and we can take a look.
Get the Best IT Support
Schedule a 15-minute call to see if we’re the right partner for your success.
Testimonials
What Our Clients Say
Here is why you are going to love working with Sequentur