Software Development Paradigm Trap main page

Feedback: Software Development Paradigm Trap

John Taylor - 8/3/2006

The important questions, in the end, are the ones I ask the people that work for me when they ask me "Is it good enough?":

1. Is it wrong of me to want perfection from this project?
2. Is it wrong of me to expect perfection from this project?

Invariably, the answer is no!

When a phone being used by a customer in Washington state shut off just after being powered in what seemed an arbitrary, way while connected to a battery that seemed to be adequately charged, for the reason of a "Low battery," production and shipping ground to a halt. After several weeks of debug the phone's project group found nothing wrong and in an eleventh hour panic, called on me to help.

The biggest clue was the most obvious for me: The user learned that if he put the cold phone in his pocket for five minutes, the problem went away no matter how cold the phone would subsequently become.

A modest amount of detective work indicated that the issue was the software in the phone had read the thermistor and come up with an ambient temperature (Celsius) that was forty degrees higher than the minus ten it was actually experiencing causing an incorrect remaining-capacity-versus-temperature table to be consulted. Two batteries were available for this phone model, identified by two "slightly" overlapping thermistor transfer curves: At the overlap the software would make a choice based on its previous information about the temperature such as it was definitely curve one above the overlap, so it must still be curve 1 within the overlap. Unfortunately, if the phone powered up in the overlap area, the code had a default at boot which pointed to, in this case, the wrong transfer curve.

When asked why a board mounted thermistor for the PA was not consulted to cross check an uncertain (Overlap) reading from the battery, the answer was "Yeah we thought about that, but decided phones would never (or very-very rarely) be booted in the overlap area so we didn't bother to code for it" (See "Corner case"). This code took one extra line of C and the problem went away for ever.

Moral of the story? The answer to question 1 and 2 is always no!

John Taylor
Hardware Engineer
Taylor Electronics
(Kyocera Wireless Corp)

Mark Bereit - 8/6/2006

John,

Thank you for your comments! That's a good story, and a great attitude toward successful product design. The trick, of course, is coming up with the thought processes and development approaches that can get us ahead of mushrooming project complexity...

John Taylor - 8/7/2006

I didn't include the part where I had to send the software "Engineer" out of my office with the admonition to not return until the decision made by his 10 line function was always the correct one regardless of the combination of the 2 inputs, i.e. all possible combinations of the battery temperature ADC and the PA temperature ADC. To this end, I made him write a test routine that printed out all 66k combinations, the function results and the expected results... after 3 attempts he finally gave up and I had to do it for him to get the project off ship hold.

I can't tell you how many times an obvious problem was not dealt with because an engineer convinced the people above him that it was a "Corner case" that was very unlikely to come up during "Normal operation." At my company, they like to do a "Smoke test" followed by a "Wow, this code didn't break" test. For the later, if you set the phone on a bench and it doesn't crap itself before the battery runs down you ship it! We just had a company wide edict that all employees must not carry one of the latest model phones and report any problems so that we can get more "Test" hours to fill a hole in our product quality dam... I emailed the CEO thanking him for not opening a line of hand grenades!

Does no one remember the old saying "A stitch in time saves nine"?

That last reference actually answers your basic question about why we ("we" loosely used, not I?) have become satisfied with a metric that insists that any software project will have X number of bugs per Y lines of code. It's easy to convince yourself that a bug in a phone UI won't kill anyone because it probably won't, so I can almost understand why people will let a "Glitch" slide without explanation. I did say almost! We as a profession need to start worrying about things we might have missed, start feeling bad for every error that we find ourselves and start feeling especially bad for every error that eventually does show itself; bad enough to remember to never do it again.

In general, my number one rule over the last 30 years has been "There is no magic": A system can't be expected to perform by mysterious means but must be designed through solid understanding and compliance with the underlying rules (Physical, electrical, timing etc). If a system design violates a single hard an fast rule then you must redesign the system or modify your expectations. I am a strong believer in prototyping all mechanical, hardware and software that I'm working on and if the real device doesn't work as expected and doesn't match the simulation then I'll work until they do... This just happens to be a genetically imposed constraint and so it has been very hard to teach, but I keep trying.

To this end, distributed processing has been one of my favorite tools. Just as your article describes in a hypothetical manner; in reality for a single circuit just as for a single code block, if you can devote enough time to the task such that all possible inputs are accounted for so as to deliver only desired outputs then these tested blocks can be built into larger structures like so many Legos.

I could go on for hours but I have another patent to write this morning...