Files
cmi/DATTO_Backup_Performance_Analysis_Report.md
2025-12-18 16:54:46 -05:00

14 KiB

DATTO Backup Performance Analysis Report

Prepared for: Management Review Date: December 17, 2025 Prepared by: IT Infrastructure Team Subject: Investigation of Slow VMware VM Backup Performance to DATTO Appliance


Executive Summary

An investigation was conducted to determine the root cause of extremely slow backup speeds (2-5 MB/s) when backing up VMware virtual machines to the DATTO backup appliance. Despite having 10Gbps network infrastructure capable of 1,000+ MB/s throughput, backups are completing at less than 1% of network capacity.

Key Finding: The network infrastructure (HP switch, cabling, VLANs) has been ruled out as the cause. The bottleneck has been identified as the DATTO backup agent software running inside the Windows virtual machines, specifically the MercuryFTP protocol used for data transfer.

Recommendation: Engage DATTO support with the evidence documented in this report to resolve the software-level performance issue.


Problem Statement

Metric Expected Actual Gap
Network Capacity 10 Gbps (1,250 MB/s) - -
Practical Throughput 100-500 MB/s 2-5 MB/s 99% under capacity
8TB File Server Backup 4-8 hours 24-48+ hours 6-12x longer

The slow backup speeds are causing:

  • Extended backup windows overlapping with business hours
  • Incomplete backup jobs
  • Increased risk of data loss due to stale recovery points

Infrastructure Overview

Network Topology

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│                        HP 5406R zl2 Switch                              │
│                        (10Gbps Infrastructure)                          │
│                                                                         │
│    ┌──────────────┐                           ┌──────────────┐          │
│    │   CMIFS02    │                           │  DATTOBU02   │          │
│    │ File Server  │                           │   Backup     │          │
│    │    8.7 TB    │                           │  Appliance   │          │
│    └──────┬───────┘                           └──────┬───────┘          │
│           │                                          │                  │
│      VLAN 212                                   VLAN 250                │
│    (FileServer)                              (IT-Management)            │
│           │                                          │                  │
│    ┌──────┴───────┐                           ┌──────┴───────┐          │
│    │   Port F2    │                           │   Port E5    │          │
│    │   10 Gbps    │                           │   10 Gbps    │          │
│    │   Status: UP │                           │   Status: UP │          │
│    └──────┬───────┘                           └──────┬───────┘          │
│           │                                          │                  │
│           │         ┌──────────────┐                 │                  │
│           │         │   Port A20   │                 │                  │
│           └────────►│   1 Gbps     │◄────────────────┘                  │
│                     │   Router     │                                    │
│                     │ (Inter-VLAN) │                                    │
│                     └──────────────┘                                    │
│                                                                         │
└─────────────────────────────────────────────────────────────────────────┘

Device Identification

Device MAC Address Switch Port Speed VLAN Status
CMIFS02 (File Server) 00:50:56:8F:35:77 F2 10 Gbps 212 Up
DATTOBU02 (Backup) 6C:92:CF:17:BD:20 E5 10 Gbps 250 Up
Router/Firewall Multiple A20 1 Gbps Multi Up

Investigation Results

1. HP Switch Analysis - PASSED

The HP 5406R zl2 switch was thoroughly analyzed and cleared of any issues.

System Health

Metric Value Assessment
Uptime 242 days Stable
CPU Utilization 0% Excellent
Memory Free 72% Excellent
Packet Buffers Missed 0 No packet drops

Port Status

Port Device Link Speed Errors Drops
E5 DATTO Appliance 10 Gbps None None
F2 VMware ESXi Host 10 Gbps None None
A20 Router 1 Gbps None None

Configuration Review

Setting Configuration Impact on Backups
QoS / Rate Limiting None configured No throttling
Port Security No restrictions No blocking
Spanning Tree Disabled No blocked ports
Broadcast Limits None (0) No limits
Flow Control Off (normal) No impact

Conclusion: Switch is operating normally with zero packet loss and no throttling mechanisms.


2. VMware Performance Analysis - PASSED

Real-time performance monitoring was conducted during an active backup using the vCenter Performance API.

During Active Backup (CMIFS01)

Metric Value Assessment
Disk Read Speed 53-76 MB/s Good - VM reading data quickly
Disk Latency 2 ms Excellent - no storage bottleneck
CPU Usage <10% Good - not CPU bound
Network TX 0.4-0.5 MB/s BOTTLENECK IDENTIFIED

Historical Analysis (30 Days - CMIFS02)

Metric Average Maximum Assessment
CPU Usage 5.7% 10.4% No issues
Disk Latency 1.5 ms 15 ms Excellent
Memory Usage Normal Normal No issues

Critical Finding: The VM reads from disk at 76 MB/s but only transmits 0.5 MB/s to the network. This is a 150:1 ratio indicating the bottleneck is inside the VM, not the network.


3. DATTO Appliance Analysis - ISSUES FOUND

Review of DATTO appliance logs revealed multiple problems:

Issue Description Severity
Zpool Capacity Exceeded Storage pool at or near capacity High
High CPU Load "Load average exceeds 2x number of cores" High
HIR Failures "Failed to copy bootmgfw.efi" on Windows Server 2025 Medium
Backups Paused Some agents showing "paused indefinitely" High

DATTO Backup Method

The DATTO appliance is using in-guest Windows agent backup with MercuryFTP protocol (TLS-encrypted proprietary transfer). This is NOT using VMware-native backup APIs (VADP).

Example from DATTO agent log:

Transport: MercuryFTP (TLS)
Backup Speed: 0.57 MB/s

Root Cause Analysis

Eliminated Causes

Potential Cause Evidence Status
HP Switch 0% CPU, 0 dropped packets, 10Gbps links up Eliminated
Network Cabling All ports showing 10GigFD negotiation Eliminated
VLAN Configuration Correct tagging, routing functional Eliminated
VMware Storage 2ms latency, 76 MB/s read speed Eliminated
VMware CPU <10% utilization during backup Eliminated
ESXi Host 10Gbps uplinks, no errors Eliminated

Confirmed Root Cause

DATTO Windows Agent / MercuryFTP Protocol Performance

Evidence:

  1. VM disk reads at 76 MB/s, network transmits at 0.5 MB/s (150:1 ratio)
  2. Bottleneck occurs between disk read and network transmission inside the VM
  3. DATTO appliance showing resource constraints (storage full, high CPU)
  4. Windows Server 2025 compatibility issues with DATTO HIR process

Bandwidth Utilization Analysis

Available Bandwidth vs. Actual Usage

10 Gbps ─┬─────────────────────────────────────────────────── 1,250 MB/s
         │
         │
 1 Gbps ─┼─────────────────────────────────────────────────── 125 MB/s
         │  (Router inter-VLAN link - theoretical max for this path)
         │
         │
         │
100 MB/s ┼───────────────────────────────────────────────────
         │
         │
 10 MB/s ┼───────────────────────────────────────────────────
         │
  5 MB/s ┼─ ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ Peak observed
         │
  2 MB/s ┼─ ████████████████████████████████ Average observed
         │
  0 MB/s ┴───────────────────────────────────────────────────

         Actual backup speed: 2-5 MB/s (0.2-0.4% of available capacity)

Business Impact

Current State

Metric Value
CMIFS02 Data Volume ~8.7 TB
Current Backup Speed 2-5 MB/s
Full Backup Duration 20-50 days (theoretical)
Incremental Backup Duration Variable, often exceeds backup window

Risk Assessment

Risk Likelihood Impact Mitigation
Incomplete backups High High Resolve DATTO performance
Data loss in disaster Medium Critical Resolve DATTO performance
Backup window overlap with production High Medium Resolve DATTO performance

Recommendations

Immediate Actions

  1. Open DATTO Support Ticket

    • Provide this report as evidence
    • Request investigation of MercuryFTP protocol performance
    • Request review of appliance capacity (zpool full)
    • Inquire about Windows Server 2025 compatibility
  2. DATTO Appliance Maintenance

    • Address "Zpool capacity exceeded" warning
    • Review and clear old recovery points if possible
    • Investigate "backups paused indefinitely" status

Questions for DATTO Support

  1. Why is MercuryFTP only achieving 0.5 MB/s when the network supports 1,000+ MB/s?
  2. Can the backup method be changed to use VMware VADP (agentless) instead of in-guest agent?
  3. Is Windows Server 2025 fully supported? (HIR failures observed)
  4. What is the recommended resolution for "Zpool capacity exceeded"?
  5. Are there tuning parameters for MercuryFTP transfer speeds?

Alternative Solutions (If DATTO Cannot Resolve)

Solution Pros Cons
Veeam Backup & Replication Native VMware VADP support, proven fast Licensing cost, migration effort
Nakivo Backup VMware-native, competitive pricing Migration effort
VMware-level DATTO backup Uses VADP instead of in-guest agent May require DATTO configuration change

Appendix A: Switch Configuration Summary

Switch Model: HP 5406R zl2 (J9850A)
Firmware: KB.16.11.0020 (July 2024)
Management Modules: Dual (Active/Standby)

Key Ports:
- E5 (DATTOBU02): 10GbE-T, VLAN 250 untagged
- F2 (ESXi Host): 10GbE-T, VLAN 212 tagged
- A20 (Router): 1GbE, Multi-VLAN tagged

No QoS, rate limiting, or traffic shaping configured.

Appendix B: Evidence Summary

Evidence Type Source Finding
Switch CPU/Memory show system 0% CPU, 72% memory free
Packet Drops show system 0 buffers missed
Port Status show interfaces brief All 10Gbps links up
VM Disk Performance vCenter API 76 MB/s read, 2ms latency
VM Network Performance vCenter API 0.5 MB/s TX during backup
DATTO Logs Appliance UI Zpool full, high CPU, HIR failures
Backup Speed DATTO Agent 2-5 MB/s via MercuryFTP

Appendix C: Additional Switch Findings (Unrelated to Backup)

During the investigation, the following items were noted for separate remediation:

  1. Brute Force Login Attempts (12/14/2025)

    • Source: 10.254.50.24
    • Usernames: "admin", "Cisco"
    • Recommendation: Identify and investigate this device
  2. Port A24 Link Flapping (12/16/2025)

    • Third-party SFP+ DAC cable showing intermittent connectivity
    • Recommendation: Replace cable
  3. DATTOBU01 (Port F5) Offline

    • Second DATTO appliance not connected
    • Verify if intentional or needs reconnection

Report End

This report was prepared using data collected from HP switch CLI, VMware vCenter Performance API, and DATTO appliance logs. All network infrastructure components have been verified as functioning correctly. The performance issue has been isolated to the DATTO backup software layer.