latest updates from easySERVICE™
There has been a lot of change in backup in the last decade. It is tempting to quickly choose a storage appliance with deduplication or sign on with the cloud provider, but the hidden costs and impact on your security, backup performance, restore performance, instant recoveries, etc. can suffer greatly.
Backup solutions can vary enormously and it is important to be aware of the consequences of choosing one solution over another. While many services provide excellent protection, some cover just data while others cover entire systems. Some enable faster recovery of your systems in the event of a major problem, while others ensure that your systems are safe from physical risks such as fires, floods or theft.
Good records management includes both backup and archiving. However, while these terms are often used interchangeably, it is important to distinguish between them when considering a records management process.
Backup is used for data recovery, while archiving is used for preserving and retrieving data in the event of a disaster, inquiry or litigation. In simple terms, think of backup as short-term and archival as long-term. Data protection policies and legalities also need to be considered as some cloud backup solutions may not be suitable for US businesses due to where the data is ultimately stored.
Here are the different solutions and the use cases they best serve:
In the 1980s and 1990s, all organizations backed up to tape onsite and made tape copies to go offsite. Even though spinning random access disk was used for primary storage, tape continued to dominate backup mainly due to cost. Most customers keep weeks to months of retention onsite and months to years of retention offsite. As a result, backups can be restored from any point in time to find deleted or overwritten files, to restore data for legal discovery or a financial audit, and other backups that need to be restored from a past point in time.
In the 1990s, the backup applications could only write natively to tape, but the cost of disk was falling. Middleware software called VTL (virtual tape library) emerged and allowed the backup application to write to the VTL software as though it was a tape library and, in turn, the VTL software would then write to disk. Around 2000, serial ATA (SATA) drives were introduced at higher densities and with better reliability for use in the datacenter. With VTL middleware and low-cost SATA drives, organizations could introduce disk into the backup process. This is called disk staging.
Since tape can be slow and unreliable, introducing disk into the process allowed for faster and more reliable backups and restores. Tape at this point is relegated to keeping longer-term retention and for recovery from a site disaster using offsite tapes.
Due to the cost of disk, organizations cannot afford to completely replace tape with disk. However, in addition to the advent of low-cost SATA drives and later low-cost SAS drives, a new technology emerged called “data deduplication.” Backups are highly redundant since each week the same data is backed up, except only about 2% of the data changes from week to week. If you have a 50TB environment and you do incrementals during the week and then a full backup on Friday night, the change from week to week is only about 1TB or 2% of data at the block or byte level.
However, it is not as simple as just adding data deduplication to disk since how data deduplication is implemented can greatly impact backup performance, restore performance, backup window length, and total cost. There are three architectural implementations for disk with data deduplication: data deduplication in the backup software, inline data deduplication in a scale-up appliance, and data deduplication with a landing zone in a scale-out appliance.
This approach can slow down the backups, as a highly compute-intensive process is being performed during the backup. In addition, when the data lands on disk, it is always deduplicated. Restores, offsite tape copies, and instant VM recoveries are slow since the data needs to be put back together, or “rehydrated,” every time. In addition, as the data grows, the backup window also expands since scale-up architecture has a head end controller and only adds capacity as data grows.
Dedicated disk-based backup appliances with data deduplication employ far more aggressive data deduplication algorithms and achieve a much higher deduplication rate of between 10 to 1 and 50 to 1 with an average of 20 to 1. These appliances use far less disk and bandwidth than deduplication performed in software and, as a result, are far less costly.
The second generation appliances have scale-out architecture and a backup target landing zone. Backups are sent directly to a disk landing zone to avoid the compute-intensive deduplication process, which results in faster backups. Also, since data is not deduplicated on the fly, the most recent backups can be kept in the landing zone in their complete undeduplicated form.
Maintaining the most recent backups in their complete form avoids the need for rehydration of the data, resulting in fast restores, fast offsite tape copies, and instant VM recoveries in single-digit minutes versus hours. In addition, the second generation appliances each come with ALL associated resources: processor, memory, and bandwidth in addition to disk. As the data grows, all compute resources are added along with disk capacity.
An increase of data to be deduplicated requires an increase in processor and memory. Since scale-out appliances contain processor, memory, and bandwidth along with disk capacity, the backup window never expands which eliminates the need for forklift upgrades. As appliances are added into a GRID, the throughput and ingest rate continue to increase.
Many backup applications have added data deduplication into the backup clients/agents, into the media server, or both. These implementations are adequate for small amounts of data or low retention of one to four weeks, but are challenged as the data and retention grow. The reason is that data deduplication is very processor- and memory-intensive since there is a great deal of compute required to break the data stream into small blocks, zones, or bytes and then do a compare.
Because this process is so compute-intensive, the backup software implementations will use far less aggressive algorithms so that backup performance isn’t impacted. The clients/agents and media servers are already performing a heavy task load, and to add computer-intensive data deduplication will greatly slow backups down. Instead of using very granular block, zone, or byte, the backup software implementation uses much larger block sizes such as 64KB or 120KB.
The term “cloud” is being broadly used for every type of service possible. It has an alluring appeal in that you don’t need to own anything, you don’t have to run anything, you pay for only what you use, and you can dial your usage up and down as required. Clearly, more and more IT infrastructure will go the way of the cloud over time. There are four major requirements for backup: data security, recovery point, time to recover, and cost. With the cloud, the data of different organizations is co-mingled using the same storage.
For most organizations, this does not meet their security requirements. Second, since backup moves a large amount of data on a daily and weekly basis bandwidth is very important. Sufficient bandwidth is needed to ensure the data gets into the cloud in a timely fashion to meet the recovery point objective – RPO – and to ensure that data can be retrieved from the cloud in a timely fashion for restores or disaster recovery – RTO.
Due to these constraints, you tend to see smaller organizations with just a few terabytes of data utilizing the cloud. As soon as the organization surpasses 4 or 5 terabytes of data to be backed up, the cloud cannot meet the RTO, RPO, or security requirements.
When you consider the costs of the bandwidth and the monthly per-gigabyte charge, the cost of a cloud solution is very expensive over a 3-year / 36-month period.
Take the time to do the research and understand how each of these will impact your backup, restore and recovery objectives and requirements. Then choose the right solution for your business needs.
With the help of easySERVICE Data Services, organizations can now affordably and confidently scale their data protection strategy with the enterprise-level architecture of Backup Management Suite with following benefits:
This especially designed backup solution helps small to medium sized organizations meet RPOs and RTOs, save time, eliminate risks and dramatically reduce capital and operational costs. It is as organizations look for a backup software solution that meets these seven criteria of providing the highest levels of protection for VMs as it now offers protection for the entire IT infrastructure—be it physical Windows servers or any virtual machines.
We focus on building and designing the most appropriate infrastructure to meet the unique needs and characteristics of your individual business. Your data is too precious not to be protected by the best, most affordable and highly efficient data storage solution in the industry. Our solution is suitable for Modern Data Protection – Built for Virtualization and Private cloud solutions, without a big price tag.
If you’d like to discuss any of the above best practices or lessons learned with us or to learn more about how we are partnering with companies just like yours to ensure the availability of mission-critical applications, please contact us at (855) US STELLAR.