Software Development Paradigm Trap main page

Feedback: Software Development Paradigm Trap

Francois Audeon - 8/4/2006

Hi Mark,

I'm working as lead SW architect for the business line set top box and home media devices of Philips Semiconductors and wanted to share some reaction to your (excellent) article published in Embedded Systems Design.

Why does mechanical engineering perform better than us?

I think there are three major reasons.

First one is mechanics rules universalism. I mean, the interactions of the atoms in a piece of steel do obey to universal rules, that you can learn once for all. In a software system this is of course all but the case, because we have a whole bunch of different languages, different instruction sets, differents processors, different execution spaces, so at the end even if two pieces (Ie. two software components, whatever they are) look the same (same interfaces, specifications), we cannot guarantee that they will react the same way even when stressed in similar conditions. There might be good reasons for introducing new languages or cores, but this doesn't help us as it keep creating even more diversity and calls for new bugs.

The second reason is that in a mechanical system you can generally predict which part of the system will be affected by a particular failure. This is because, as you pointed out in your article these systems are made of smaller subsystem, and there are physical barriers between them. I agree that's exactly what we miss. Most often our subsystems are using the same ressources, so there is no/few physical barriers.That way you cannot predict to which extent a particular bug will affect the rest of the system. Is running each individual task on a dedicated processor the solution though? Might help, but it will not solve the problem entirely if other resources are still shared, like the memory or I/O lines. Here also we have to think of creating secured frontiers. MMU helps a bit, but better you would ideally want to execute each task on a physically independant subsystem, with its own bunch of HW resources. We are not there yet ...

The third reason is inputs diversity. The specifications of the oil you put in your car are very close, whatever the brand. And people accept the engine might fall apart if they put says water instead. For a DVD player (to take the same example as you used) things are much less simple. I can tell you, because I've been working in this domain for five years, that many discs do all but comply to the DVD-Video specifications. Since, because these are top tens, people would expect their DVD player to playback them correctly. But of course this creates a lot of diversity for the inputs your system has to digest... And fault tolerancy is not a proper answer, because of course people do expect these discs to play without hiccups. So a proper test strategy is crucial too. As you can't pretend to test every possible input you need to understand where your system can break, and try to stress these particular weak points. The issue is to specify the margins the system must be able to cope to.

And ultimately, we have to accept that our systems have to be used in a certain way, and focus on making this set of use cases sufficiently robust, and documented. And explain our customers that the behaviour is not guaranted if the system is used out of this boundaries. Sounds crazy? Well, look at a plane. This should be quite safe, isn't it? That's why we define flight domains, in which the plane is guaranted to operate safely. And yes, if the pilot forces the plane to leave this safety bubble it may just stall and crash. But that's the pilot (user) responsibility. Nobody in the aerospace industry would claim and guarantee you that the plane flies in any possibile condition. Why should we for our systems?

Mark Bereit - 8/6/2006

Francois,

Thank you for your comments!

I'm forced to disagree with your first reason mechanical engineering performs better than computer engineering. You are contrasting the universal rules that apply to a particular physical object with the diversity of available tool chains and technologies. But there is plenty of diversity in the physical world, too. Nobody tries to apply a simple set of behavioral rules to, say, a "beam," when there are critical differences in the properties of a wooden beam, an iron beam, a steel beam, an aluminum beam, a concrete-and-rebar beam, and so on, and there are sub-divisions of each type that have their own considerations (what type of wood? what process of steelmaking?). Physical laws and properties apply to a specific approach. For some time the manufacture of aircraft was based upon known properties of wood-and-fabric construction before the better approach of alumimum came along with its own new set of properties. I don't see that computer engineering is fundamentally different in rules versus diversity. What I do see is that very little of an environment of collecting engineering data exists in computer engineering. Between speed of change and this awful starting every project almost from scratch, we are not acquiring accumulated wisdom of what we can expect a certain class of component to do. And that is a problem.

Otherwise I agree with your comments. The ability of components to unexpectedly impact each other, and the complexity of input diversity, are serious concerns that require better thought than our industry has thus far applied. And deciding the boundary of responsibility between designer and user is also a complicated subject (even without lawyers added to the mix). We have a long way to go. But these are the things we need to be thinking about.

Francois Audeon - 8/7/2006

Hi Mark

Thanks for your answer ;-)

With the first point I wanted to emphasize the fact that HW engineers tend to make our life over complicated, as the environment in which our SW operates is more dicated by individualism than universal laws.

Take for exemple memory subsystems. Almost every single platform has its own cache caracteristics, being the size, latency, width, ...

So a precompiled SW component will not behave the same if executed on system A or system B. For me it's a little bit comparible with a mechanical system that will have to operate on two different planets where the environment would influence its very low level behavior - at the atoms level.

In general I think we can't solve such issues on our own but we also have to involve the HW communities. As what we need first is universal HW subsystems of a sufficient level of granularity. A bit like in the automotive industry where the whole engine, but also the gear box, ... are shared accross multiple brands. In our domain it's not sufficient to use the same CPUs, but we need is standardised "CPU + memory interface + caches + endianess + instruction set + ...." subsystems.

Kind regards, Cordialement, Met vriendelijke groeten.