VMware ESXi cannot see the datastore after a restart – LUN recovery [Enterprise case]
Related data recovery services
When a drive starts throwing errors, shows up as RAW or makes the computer freeze, the key is to stop all writes and avoid actions that overwrite data.
Quick answer
- DO NOT: Do not format, initialize, run CHKDSK or install an operating system on that drive.
- If the drive slows down or freezes the system, disconnect it and do not keep stressing it with more attempts.
- The safest option is to create an image first (sector by sector) and recover only from the copy.
- Take it to a lab if you hear worrying sounds or the drive disappears — that suggests a risk of mechanical damage.
Safe steps: step by step
- Stop using the drive and disconnect it.
- Do not run system repair tools on that drive (CHKDSK / “Repair”).
- If possible, create a sector-by-sector image onto another device.
- Work on the copy: scan and recover files from the image, not from the original.
- If the drive is unstable or noisy, hand it over to a professional lab (cleanroom).
Most common causes
- Bad sectors / platter surface degradation.
- Firmware / translator problems / Service Area errors.
- Electronics, power supply or port damage.
- File system errors after a power failure or interrupted write.
Details and explanation
Restarting a VMware server after a power outage is the kind of event that can paralyze an entire IT infrastructure. Imagine bringing the server back up only to discover that the datastore is gone and none of the virtual machines can start. Alarms in vSphere point to missing VMDK files, while stress and panic keep rising. Incidents like this are every IT administrator’s nightmare, especially when they are responsible for business-critical corporate data.
In this article, we take a detailed look at what can cause this situation and how to regain access to the LUN effectively so that normal operations can be restored as quickly as possible.
In our case study, we show how we recovered 40 virtual machines for a financial corporation in just 48 hours. Thanks to detailed analysis, expert tools and a well-planned procedure, we not only saved the data but also minimized financial losses that could easily have reached hundreds of thousands of złoty. Are you ready to see how this operation unfolded and learn how to prepare for a similar event in your own environment? Read on to better understand VMFS recovery and effective datastore incident handling in VMware. When dealing with arrays and NAS systems, the safest route is to move straight to RAID data recovery — without blind rebuild attempts.
Why does the datastore disappear after a VMware server restart? Causes and symptoms
Restarting a VMware server after a power outage can lead to a situation in which the datastore disappears or becomes unavailable. The causes may vary, but the most common include damage to the LUN pointer on the SAN array, multipath errors and RAID controller failures. If the VMFS partition table that stores partition information is damaged, the server is unable to load the data correctly.
As a result, the datastore may appear in vSphere as unknown or may not appear at all, while the virtual machine icons turn grey and cannot be started.
Symptoms of datastore loss after a restart
Symptoms of datastore issues are usually clear — administrators may notice warnings in ESXi logs such as Lost access to volume/vmfs/volumes/[UUID]. Even though the LUN may still be visible and online on the SAN array, the ESXi host may refuse to mount it, which means no access to the virtual machines and their VMDK files. In such a situation, it is critical not to take rash actions such as rescanning the HBA or removing the LUN, because this can destabilize the metadata and make the situation worse.
Step by step to LUN recovery: our proven procedure In these cases, RAID/NAS reconstruction is based on rebuilding the actual layout and parameters of the array, not on guesswork.
Step by step: datastore recovery
To recover a lost datastore effectively, you need to act methodically and carefully. The first step is to analyze and secure the current state of the infrastructure. It is important to stop all repair attempts immediately, because accidental actions may cause further damage. Next, we verify the condition of the SAN array and confirm that the LUN is in an optimal state. We also recommend creating a full ESXi configuration backup using vSphere CLI and, if possible, taking a snapshot of the LUN on the array.
The next step is deep diagnostics, including disk imaging and RAID structure analysis. This allows us to reconstruct the array in a virtual environment and locate the beginning of the VMFS partition. If we find any damage to the partition table, we use specialist tools to repair VMFS according to its exact version.
It is extremely important to carry out all of these actions with caution in order to minimize the risk of further data loss during recovery. In cases of serious damage, we rely on proven methods that have repeatedly allowed us to restore virtual machines to operational condition efficiently.
Case study: how we recovered 40 virtual machines in 48 hours
Faced with a crisis, the financial company found itself in a situation that could have crippled its operations. After a planned VMware server restart, two out of five datastores disappeared, making it impossible to start forty virtual machines. In a rushed attempt to save the situation, the administration team decided to rescan the storage adapter, which unfortunately damaged the partition table. Knowing how critical speed is in incidents like this, our team started emergency actions immediately.
The first four hours after the report were spent collecting the drives from the client’s office in Warsaw. We then focused on imaging 24 drives, each with a capacity of 2 TB. The next phase was a virtual reconstruction of the RAID 10 array and VMFS repair, which took another eight hours.
We divided the entire process into staged tasks to maximize efficiency. In the end, after 48 hours, we restored all 40 virtual machines, reducing downtime to only eight hours and saving the company an estimated PLN 500,000 in losses.
Related articles
Having a similar problem with your storage device?
If your drive is no longer detected, the computer reports read errors, or you have lost access to important files, do not repeatedly run repair software. This can worsen the condition of the device and make data recovery harder.
Choose the safest next step for a VMware, SAN or virtualisation incident: