I’m sitting here with a browser on one window and a text console to an installing system in another. Why? Because I’m waiting for the installation to finish. I’ve been debugging an odd bcfg2 failure during kickstart post-install for our provisioning system. It first started last night when I left the office. I’d just fired off a reinstall of an IAM system to verify that it would work correctly from the production kickstart (as I’d just pushed out the first real production bits to it).
This morning, I got in only to stare at a console still stuck in the kickstart post install. Sigh. Ok, dig around to find the magic remote rescue arcana so I can poke around for the logs. See that two files aren’t binding correctly in bcfg2, which potentially croaked the install (it certainly looked like it hung, that’s for sure). Get the kickstart updated to use the “right” profile for now.
Reboot, reinstall. Lather. Rinse. Repeat.
Ok, kickstart is completing successfully! Yay! Confetti and champagne for everyone!
Hey, grub doesn’t have the right setup. Easy fix in the repo by moving the TGenshi template processing into the right group. Go to run a quick update on the IAM system and .. hey, where’s bcfg2?
No wonder post didn’t error out. It didn’t actually do anything! Well .. it did. It errored out on yum because … the rpmforge repo got corrupted. Why did it get corrupt? Well, it appears that the stable thing we’ve been doing for months is now broken because the repository where the rpmforge gpg keys and yum repo setup is at isn’t answering requests.
Fix the url, reinstall and now I’m back where I started early this morning. A broken bcfg2 config that stalls out in post.
I love four hour snipe hunts.
At least we know where we need to fix some things, including:
- Pulling the rpmforge-repo rpm locally
- Possibly mirroring all of rpmforge for the repos we need
- better error handling on the kickstart post
- need better portholes into the post install to see where errors are. Like, why isn’t our safety shell starting on tty2 like it should be?