Synthetic Backups- An Advanced Technical Overview – Understanding the Key Aspects of a SAN Storage Environment

Data protection strategies have evolved significantly to address the exponential growth of data volumes in enterprise environments. Traditional backup methodologies often struggle to meet shrinking recovery point objectives (RPOs) and recovery time objectives (RTOs). Synthetic backups have emerged as a pivotal mechanism to address these challenges, offering a sophisticated approach to data assembling that optimizes both network bandwidth and storage I/O.

Traditional Full vs. Synthetic Architectures

To understand the utility of synthetic backups, one must first critique the traditional full backup model. A standard active full backup reads all data from the source volume and writes it to the target storage. While this provides a complete, independent copy, it is resource-intensive. It saturates network bandwidth and places a heavy I/O load on production servers, often necessitating extended backup windows that disrupt operations.

In contrast, a synthetic backup does not interact with the production source to generate a full file. Instead, it synthesizes a full backup file on the target storage repository by aggregating data from a previous full backup and subsequent incremental backups. This process occurs entirely on the backup server or storage appliance, effectively offloading the processing burden from the production environment.

The Mechanics of Synthesis

The technical execution of a synthetic backup involves complex pointer manipulation and block-level processing. The backup application identifies the data blocks required to construct a new full backup image. It then assembles these blocks from existing files in the backup chain.

There are two primary methodologies for achieving this:

1. Synthetic Full (Forward Incremental)

In this scenario, the backup server reads blocks from the most recent full backup and all subsequent incremental backups. It consolidates these blocks into a new full backup file (VBK in Veeam terminology, for example). The result is identical to an active full backup but is created without touching the production disk.

2. Reverse Incremental

This method is more computationally intensive during the backup window but offers faster restore times. When a new incremental backup is taken, the software immediately injects the changed blocks into the existing full backup file. The blocks that were replaced are pushed out into a reverse incremental file (VRB). This ensures that the full backup file on disk is always the most current state of the system.

Advantages of Synthetic Operations

The primary advantage of synthetic backups is the drastic reduction in the backup window. Since only incremental changes are transferred across the network, the duration of data transfer is minimal compared to an active full backup.

Furthermore, synthetic operations reduce the load on primary storage arrays. By performing the heavy lifting of file assembly on the backup target, production resources remain available for business-critical applications. This approach also facilitates "forever incremental" strategies, where a full backup is taken only once, and all subsequent fulls are synthesized, saving massive amounts of network bandwidth over time.

Considerations and Drawbacks

Despite their efficiency, synthetic backups are not without trade-offs. The synthesis process is I/O intensive for the backup storage target. If the underlying storage hardware lacks sufficient random I/O performance (IOPS), the synthesis process can be slow, potentially exceeding the time it would take to perform an active full backup over a fast network.

Additionally, data integrity is a concern. If a corruption occurs in a previous incremental file or the base full backup, that corruption will propagate into the synthetic full backup. Advanced error correction and regular health checks (such as CRC checks) are mandatory to mitigate this risk.

Strategic Use Cases

Synthetic backups are particularly advantageous in environments with:

Limited Bandwidth: Remote sites or branch offices with slow WAN links benefit significantly as they only transmit changed blocks.

Large Databases: For multi-terabyte SQL or Oracle databases where active full backups would exceed the available maintenance window.

Virtualization: Virtual machine image-level backups are ideally suited for synthetic assembly due to their block-based nature.

Implementation Prerequisites

Implementing synthetic backups requires careful selection of backup software and hardware. The backup software must support advanced block tracking (like VMware CBT or Microsoft RCT) to efficiently identify changed blocks.

On the hardware side, the backup repository should use file systems or appliances optimized for synthetic operations. Deduplication appliances often have specific integration protocols (like DD Boost for Data Domain) that offload the synthesis process to the appliance itself, significantly improving performance.

Optimizing Data Protection

Synthetic backups represent a mature, high-efficiency alternative to traditional backup methods. They solve critical problems related to backup windows and production impact but introduce new requirements for storage performance and data integrity verification. For organizations managing large datasets with stringent SLAs, implementing synthetic full backups is often the most logical architectural choice.