Double check! Always!

February 24, 2010 — 3 Comments

Today I learned the hard way how important is to check everything at least twice.

I had to perform an OS patching on a two node NFS cluster, a couple of BL860C running HP-UX 11.31 with three packages, one for development, one for testing and the third for the production environment.

Everything was fine until I tried to run one of the packages on its failover node in order to patch its primary, the package didn’t start. The log of the package was populated with one error after another, of course I tried to identify the source of the error… nothing, I read the config files line by line… nothing. At that point I started to sweat and then just when a I was thinking to write my resignation letter I noticed it. Somehow an extra damn “equal” slipped into the hanfs.sh file and of course the cluster didn’t recognize the sharing options and refused to start the package.

In fact the downtime has not been more than a few minutes and fortunately for me not in the production package but the excuses aren’t valid, at least for me. I was so confident on my expertise that I made a newbie mistake, probably during a modification of the package after the creation of a new filesystem.

Have to be more careful in the future.

Juanma.

Advertisements

3 responses to Double check! Always!

  1. 

    Thanks for the wake-up call. Seems like every few months I need a good anecdote to tell me that I’m never so experienced that I can make changes without double-checking (or documenting) them.

  2. 

    Thanks for your comment Wesley. It’s sad but it’s true we, the Sysadmins, tend to be self-confident and a little arrogant and we really need this wake-up calls from time to time.

  3. 
    big.sunflower March 8, 2010 at 08:56

    HP Cluster Consistency Monitor tool (CCMon)
    http://h71028.www7.hp.com/enterprise/downloads/ccmon-service-brief.pdf

    might be worth to look at.

    “… delivers a diagnostic tool for spotting potential
    disruptions to critical applications. In this service, an HP engineer installs and
    configures the HP Cluster Consistency Monitor tool (CCMon) on a single cluster.
    During the configuration process, the engineer profiles the resources required for
    the running application and brings the entire cluster toward a consistent state.
    The HP engineer works with your IT staff in this process and provides a report on
    the resource profile and found configuration differences as a way to introduce
    you to CCMon features.

    The HP engineer sets up CCMon as a background process that continuously
    checks thousands of data points for configuration differences. CCMon running in
    automatic operation triggers alarms when changes occur, and generates
    scheduled HTML or ASCII reports to provide advance warning of possible failures
    without the need for switchover testing.”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s