latest updates from easySERVICE™
A virtual machine (VM) is a software implementation of a computing environment in which an operating system (OS) or program can be installed and run. The virtual machine typically emulates a physical computing environment, but requests for CPU, memory, hard disk, network and other hardware resources are managed by a virtualization layer which translates these requests to the underlying physical hardware. The virtualization layer can be used to create many individual, isolated VM environments.
Virtual machines can provide numerous advantages over the installation of OS’s and software directly on physical hardware. Isolation ensures that applications and services that run within a VM cannot interfere with the host OS or other VMs. VMs can also be easily moved, copied, and reassigned between host servers to optimize hardware resource utilization. Administrators can also take advantage of virtual environments to simply backups, disaster recovery, new deployments and basic system administration tasks. The use of virtual machines also comes with several important management considerations, many of which can be addressed through general systems administration best practices and tools that are designed to manage VMs.
Why can be the causes of VM restore failure?
When a virtual machine restoration fails, it is important to troubleshoot the problem quickly. The event logs are a good starting point, but sometimes the information in the event logs can be a bit cryptic and might not spell out exactly what the problem is.
1. The backup was corrupt: One of the most common causes of restoration failures is a corrupt backup. Backup corruption can occur as a result of media failures, communication failures, or any number of other causes. The only way to protect your organization against backup corruption is to periodically test your backup and correct any problems that you might find.
Backup corruption can be really tough to troubleshoot. In these situations, the backup logs are going to be your best source of information. If the logs contain read errors then media problems might be to blame. Similarly, if the logs reflect a communications failure then you might have a bad cable or a bad I/O card.
2. The VSS is not running: Backups of Windows servers are generally based on the Volume Shadow Copy Service. The backup application typically acts as a VSS requester and sends a request to the VSS provider, whose job it is to coordinate the various VSS writers. The coordination of the provider and the writers facilitated by an operating system-level service called the Volume Shadow Copy Service. This service must be started in order for VSS to function correctly, and any required writers must also be running.
You can check the state of the Volume Shadow Copy Service by using the Service Control Manager. You can check the state of the individual VSS writers by using the VSSAdmin List Writers command.
3. The backup was taken at the file level: Most modern backup applications fully support virtual machine backups. However, these same backup applications generally also provide file-level backup capabilities. If a backup application is configured to back up a host server at the file level (as opposed to performing a virtual machine-specific backup) then there is a good chance that you will have restore failure because file-level backups of running virtual machines are almost guaranteed to be in an inconsistent state.
4. Critical VM components were omitted from the backup: Virtual machines consist of more than just virtual hard drives. There is critical metadata and configuration data associated with every virtual machine. This data identifies the virtual machine and its hardware resource allocations.
In some organizations, it is common practice to store virtual machine configuration data and snapshot data separately from the virtual hard disks. In those situations, the backup application must be aware of the decentralized nature of the various virtual machine components. Otherwise some virtual machine components might not get backed up, which would make a virtual machine-level restoration impossible. The quick and dirty way to find out what happened would be to check the logs. The best way to find out is to verify the integrity of the backups through testing.
5. Storage quotas are being exceeded: Another common cause of virtual machine restoration failures is that sometimes storage quotas can get in the way. This can be particularly problematic in private cloud environments in which each tenant is allocated a limited amount of storage. Unless the remaining storage allocation is sufficient, the restoration might be impossible without temporarily adjusting the quota.
6. The host runs low on memory: When a virtual machine is restored, there are certain resources that are consumed on the host server while the restoration is taking place. Generally the restore operation will consume disk and network I/O, CPU cycles and memory.
The problem with this is that some organizations attempt to achieve the highest possible virtual machine density on each host in order to maximize the return on their hardware investment. If the host server is already low on resources, then a restore operation might fail as a result. This is particularly true of situations in which there is not sufficient memory available.
7. A virtual backup appliance live migrates: Some organizations configure their server virtualization infrastructure to dynamically shift workloads among the available virtualization hosts in response to demand. This can sometimes be a problem if the backup application is running on a virtual appliance.
A virtual appliance should theoretically be able to live migrate to another host while a backup operation is running without causing any problems. Sometimes, however, the migration process can cause a momentary loss of connectivity, thereby causing currently running jobs to fail.
8. The application is protected against a restoration: Another reason why a restore operation may fail is because some applications are protected against restorations. Exchange Server is a classic example of such an application. If you attempt to restore a mailbox database, the restore operation will fail unless you specifically give Exchange Server permission to overwrite the database. Although this type of protective mechanism should not prevent a virtual machine-level restoration, it can get in the way of restoring individual applications or databases within a virtual machine. This can happen in physical, virtual or mixed environments.
9. The restoration conflicts with a running VM: A virtual machine restoration can sometimes fail if the virtual machine that is being restored somehow conflicts with a running virtual machine. The degree to which various types of conflicts can be handled varies from one backup application to another, but some of the conflicts that might cause a restoration failure include duplicate virtual machine names, duplicate virtual MAC addresses or duplicate operating system-level identifiers
10. The underlying cause of the problem has not been resolved: Suppose for a moment that a VM becomes corrupt and you decide to restore that VM from backup. If the restoration fails, then it could be because the underlying cause of the corruption has not been addressed. For example, if the VM originally became corrupted as a result of a disk volume problem, but you did not take the time to fix the volume before attempting the restoration, then it is possible that the restoration could fail as a result of the volume’s state.
As you can see, there are a number of different factors that can cause a restoration failure. If a restoration fails, it is a good idea to check the backup application’s event logs for clues as to the cause of the problem.
Generally speaking, security-related log entries point to a password problem with the service account or a lack of sufficient permissions for either the backup operator or the service account. An agent failure or a more generalized failure message can be caused by problems with the Volume Shadow Copy Service. Likewise, communications failures typically point to hardware problems, while a read failure might indicate a bad tape or a dirty tape drive.
Many customers realized that their recovery procedures were only as good as their last test, and that incomplete runbooks lengthened the time it took to recover their mission-critical applications, adversely impacting their business. Companies rely on easySERVICE™ Data Solutions as a potential partner in helping to create and manage the lifecycle of a long-term disaster recovery program that would free them to pursue higher-value work streams.
Whether you are looking for easySERVICE™ Disaster Recovery Planning or solutions for your unique IT requirements, we have a solution specially designed for you. No matter the size of your business, you will always get the kind of support that goes far beyond the ordinary. At easySERVICE™ Data Solutions, we have collected our most noteworthy resources in one convenient disaster recovery guide to answer all your questions and help you decide if launching a DRaaS offering is the right move for you.