Commvault backups: racing against the clock

When I started my current job, one of my assignments was to modify an existing Commvault Simpana backups installation to include offsite disaster recovery copies.  Data was to be copied between company data centers from the local site to the East Coast DR site over a shared 12 Mbps MPLS.  Our goal was to have 30 days local retention, and 90 days offsite.

What was configured:

  • Two Network Appliance Filers at local site, one Filer at DR site
  • Only local backups were configured with Commvault Simpana 9.
  • The local backups of Netapp NDMP data were using the NDMP server plugin.

What we added:

  • An additional virtual machine Media Agents running Commvault at the DR site.
  • The primary Media Agent was configured to replicate offsite (Auxiliary Copy) to a secondary Media Agent at the DR side.
  • Configured deduplication on all shares

Storage Policies: here are the backup sizes for each.

Filer1: 696 GB (343 GB deduplicated)

Filer2: 1.8 TB (798 GB deduplicated)

Linux: 680 GB (300 GB deduplicated)

Windows: 350 GB (283 GB deduplicated)

Filer2 was taking two weeks to copy one Full job to the remote site! This was way off from our desired backups time window/RPO. Filer2 was slow because it had too many things bundled into one large (1.8 TB) subclient, taking about 10 hours to back up 10 million files locally.  I asked Commvault support and they recommended we divided it up into several subclients.  Each subclient grouping still has a lot of small files to manage.

Filer2-chunk1: 718 GB – 2 million files

Filer2-chunk2: 195 GB (included vol0) – 1.5 million files

Filer2-chunk3: 343 GB – 800K files

Filer2-chunk4: 269 GB – 2.95 million files

Filer2-chunk5: 342 GB – 800K files

It seemed like as we got closer to sizes of about 300 GB, it would be more manageable for sending data over to the remote site.

What we succeeded in doing:

  • Created incremental storage policies and linked them to the primary policies
  • Spread out Full copies so they are only about every 3 months (only on the largest jobs)
  • Enabled “DASH” Network-optimized Auxiliary Copies to only copy changed data

The big improvement was the configuation of both Filer storage policies as an incremental so that we could reduce the amount of data frequently going over the WAN to the remote site. Offsite incremental copies times were greatly improved:

  • Filer1: 15 GB in 30 minutes.
  • Filer2: 16 GB in 30 minutes.

It was a race to complete backups within the time window, but we were finally sneaking in under an acceptable time.