…an ironic tale of diligence gone bad.
Somehow I have been given the job of making sure that the automated builds of our products are performed each night. Included with this unwanted task was the job of creating the installers for the products. As someone who writes software and QA’s software, I have learned not to trust any software, whether it is written by Microsoft or Joe Blow in his garage.
The other day we received a copy of a new installer authoring tool – InstallAnywhere – which has been mandated from the higher ups that we use for creating new installers. Having been biten before by trusting that the install of one product would not disrupt other products, I decided to be diligent and make sure I could get the build server back to a working state if the new installer software caused issues.
I asked the lab manager if it would be a big deal to “ghost” the machine – this involves using a product called Norton’s Ghost 2003 to completely back up the operating system. In theory, if something goes wrong, you can restore the server to it previous state from the ghost image, as if you had never done something stupid, like installing new software on a production server. The lab manager said it would be no big deal, and started the process for me, by installing Norton Ghost and walking through the wizard and choosing all the correct configuration options.
“Okay, when you are ready to start the backup, click the finish button,” he said to me.
I sent out an email informing everyone that I would be shutting down the build server at 2:00 pm to ghost it, then waited until 2:00 and clicked the finish button. The server shut down and rebooted to DOS to run the Ghost backup program.
“Does it normally just hang there for a while?” I asked.
“Uhhhh, No!” he said. “You have fun with that.”
“It shouldn’t be a big deal,” I said confidently.
“Glad you think so. I’m just glad we backed up source control the other day.”
I gave ghost a little while longer to try to start, then rebooted the server. The system again launched to DOS, told me the backup failed, and then the system offered to reboot to windows for me. Great, I thought, I can’t back up the server, but at least it will recover nicely.
The attempted reboot to windows just sat there. After about 5 reboots and trying various things, I started to get worried.
“Should we just wipe the drive and reinstall the OS?” the lab manager asked.
“Uh no, that’s why I wanted to ghost the machine, because it is configured correcly to perform the builds.”
I googled it and found some horror stories, with the recommended solution installing another harddrive and booting from that and then trying to fix the system. That didn’t sound like fun, so I tried the NT repair disk, to no avail.
I started to search the Norton’s site and worked through several of their solutions. Still no luck. Eventually I found one that suggested using their gdisk utility. After multiple attempts with gdisk (and many more reboots because the system hung), I eventually got gdisk to tell me some useful information. It still couldn’t fix the problem for me, but it at least told me that it had created a temporary DOS partition that it was booting from, it just wouldn’t let me switch the active partition.
*** LIGHT GOES ON ***
“Do you have a bootable floppy with fdisk on it?” I asked the lab manager.
“That can’t be good,” as he hands me a windows 98 bootable floppy.
“No, I have an idea.”
The machine booted to the floppy and I launched fdisk. Fdisk found 3 partitions, and it was pretty easy to determine which one was my windows partition — the biggest one. I switched it to the active partition and rebooted.
3:20 PM – the build machine was alive and well. Not backed up mind you, but at least not destroyed.
So ironically, the software I was using to protect me from problem software turned out to be the biggest problem of all.
KEY SEARCH PHRASES:
- GHOST 2003
- Restart Windows
- Norton Ghost “return to windows”
- Norton Ghost fail return windows