This post is written by a special guest blogger, my father, John Erickson, who had a long career as an aerospace engineer with Honeywell. In response to my earlier post on the pen that leaked, he wrote this reflection, an interesting take on the risks and evaluation of equipment failure:
I read of the case of the failed pen in the January 15 blog post. The decision to continue to use a pen at risk of failing and to accept that risk is no doubt acceptable, considering that the impact of future failures includes only possibilities such as a stain on an important document or damage to a valued item of clothing. However, the story led me to imagine that this pen was a mission critical part of an upcoming space mission. The incident that comes to mind is the case of the leaking “O Rings,” wherein NASA management chose to risk the effect of cold weather on the “O Rings,” resulting in their leaking and failing to contain the hot gases, thus bringing down the Shuttle Challenger, killing its seven astronauts, including Christa McAuliffe, the first member of the Teacher in Space Project.
If your pen were considered mission critical, the pen would probably be impounded for subsequent analysis, and you would be barred from using other pens from this company until it was determined that the failure was somehow unique to this individual pen. Your pen would then be the subject of extensive failure analysis to determine the cause and the possible corrective actions. This would include the ink as well as the pen proper: has the ink deteriorated and lost its intended viscosity, is it temperature sensitive, etc. The pen barrel would be analyzed to determine if it had cracked and was overstressed; the ball itself would be investigated to determine if it had been deformed and allowed leakage, had its retainer been damaged and allowed leakage, etc. And with an identification of the probable cause of the leak: what is the most effective corrective action, and what kind of tests are needed to establish that future leakages will not happen, etc. The investigation would rely on an extensive set of methods and tools that are probably continually being improved. (See this WIKIPEDIA link entitled "Failure Analysis.")
A personal experience that I remember well: It was back in the 1958 era and Honeywell was supplying the guidance system for the Titan Intercontinental Ballistic Missile. The system included a timer, which contained a number of switching timed events that initiated actions and maneuvers of the missile. During a particular test, the timer had displayed a one-time, non-repeatable failure, in one of the timed events. I was a systems engineer on the project and it seems that we spent literally weeks, investigating and subjecting this device to all sorts of stress tests in attempts to repeat and isolate this problem, finally identifying the spot where temperature stresses caused an intermittent shorting out of a circuit function, leading to corrections that prevented such future malfunctions.
This has been a long comment in response to to the original blog post, but it’s been fun to imagine this other far-fetched type of situation, based somewhat on my work experiences.
Thanks for this blog contribution, Dad! Perhaps you will now be starting your own blog.