Now i stare at space, thinking why did not i buy a backup ups?

I have 2 esx 3.5 running on sunx4600m2's & an emc san, what i wonderful life i used to say to myself.But those nice admin days gone with the blackout.Few days ago building automation manager called me and told there will be an operation on ups and informed me, so tht i can shutdown all systems to reduce data loss in case of some problems.I did what i had to do and shutdown all systems including backbone & network switches.

Operation finished, i got systems up and everything was working fine, another beautiful admin day...
Next day morning my systems engineer called me telling tht all systems are down.End of days...

When we figured out the reason of the problem we found out tht 1 day after the ups operation, a power blackout occured and all systems were reset.

So the list occured;
1-check esx's logs, search for an error or warning.
2-check san
3-check vm's

1- Unfortunately we found a lot of scsi io error lines @ logs.
2- San were fine (god bless emc it has ups inside, even if u pull the plug off it shutsdown clearly)
3- That's a big problem.Some vm's appeared as VM1 (invalid) and VM2 (obsolete)

To solve out problem 1, we tried fsck .All went fine.
To solve out problem 3; the worst day of my life.

First i tried to remove the vm's from inventory and then readd them by browsing vmfs volumes.It did not work.So then i connected directly to esx host using viclient, and tried the same but it all didn't work.

"vmware-cmd -l" service command listed 3 running vm's but when i ping other servers i was able to reach them.
then i ran "cat /proc/vmware/vm/*/names" to list the running vm's on esx using a different method, i saw 7 vm's were up & running.
The vm's were working but i could not reach them from viclient.

At this stage, the only option was to connect to each running vm by RDP and shutdown them.Create new vm and add the old vm's hdd to the new vm's.
i still did not figure out how things blowed up, but now i know tht the first item on my checklist is to buy a backup ups for esx's and san.

Beautiful admin days...:)


 
Comments are closed.