Software Development Paradigm Trap main page

Feedback: Software Development Paradigm Trap

Alain Gronner - 5/26/2006

Mark,

I read with great interest the subject article and wanted to offer a few comments. But first, I should indicate that I'm not really a programmer (my direct programming experience dates mainly from my college days, many years ago) but as a rather senior hardware and systems engineer, I've had to interface and deal with a fair number of software programmers as well as system analysts. In this last capacity I've faced many of the same frustrations as you have noted in your article.

Some of the ideas you are suggesting in this article make a whole lot of sense and when implemented are likely to not only improve the software quality and dependability, it might also lead to an overall gain in productivity. This is all the more likely given that at least 50% of the time required to develop and release working software is spent in testing. This testing phase can be substantially shortened if one is re-using fully debugged code modules.

One of my long held gripe about software development is that it is common practice and expected to develop custom software for each new product. In the modern clothing industry, few people are getting dressed by a tailor; most people, regardless of body shape are quite content to purchase their pants and vests and what not off the rack. This allows industry to offer quality clothes at very moderate prices.

Another example well worth of study, in my opinion, is what happened to the design of digital circuitry before the advent of the integrated circuit. At that time (early 1960's), each circuit design engineer effectively designed gates and flip flops (remember Ecles Jordan) with resistors, capacitors, diodes and transistors. A very large number of different designs were used for these building blocks as each engineer tried to optimize the performance of these circuits for each specific application. Then, with the advent of the planar technology, Texas Instruments introduced their famous family of TTL logic IC's, the 74XX series. While at first circuit designers were reluctant to accept this product, its many advantages made it an extremely successful product line for TI and for most of the competitors in the semiconductor industry who introduced compatible devices. For well over 30 years, virtually all logic designs were based either on discrete IC's or on various integrated circuits combining a variety of these elementary building blocks. Even today, most CPLD's and FPGA's are based on them and high level languages have been developed to improve the productivity of the design teams.

While originally IC's containing gates and flip flops were expensive and consumed a fair amount of power; many optimization techniques were developed to minimize the number of gates and FF's used to perform a given task. Today with the substantially lower cost per gate and their modest power consumption, hardly anyone in a design capacity is concerned by optimization of a logic circuit any more. Likewise, where earlier on it was necessary to minimize the use of memory (early processors had a modest 64 kB of memory address space) nowadays, memory is so inexpensive that little thought is given to optimizing its use.

It would seem to me, admitedly a relative novice in matters relating to the software, that it could indeed be beneficial to the industry as a whole to do away with custom coding of each program and instead use a standard set of key functional blocks to implement the desired program. This would, I expect, have a result similar to the earlier shift to logic IC technology by hardware designers.

To be successful, a group of expert programmers and system analysts should take-up the task of defining those key functions/building blocks whose universal appeal would make the current coding approach/methodology a thing of the past.

I'm going to leave these thoughts with you. Feel free to run by me your thoughts or questions about this topic.

In closing, I'd like to thanks you for an inspirational article and wish you good luck in your endeavors.

Alain Gronner, EE

Mark Bereit - 5/31/2006

Alain,

Thank you for your comments. Yes, the 74xx TTL logic family seems like an excellent analogy for the needed level of components. The part that I'm fuzzy about, unfortunately, is the wires... that is, what is the interconnect for these components?

In our world of algorithms, "components" at the level of functions, or data/function objects, interact through parameter passing. The subroutine call, particularly the object member function call, is the interconnect glue. It's how we were taught to program: break a task into various small tasks, "functionize" where the subtasks are doing the same work on different data, etc. But we're simply dissecting a large algorithm, so it's no wonder that optimal "functionizing" tends not to be horribly reusable. How do we debug? Where the target allows us we set breakpoints (bringing the whole system to a halt) and step through the code, watching what is happening. (The more your system interacts with outside events, the more horribly this process fails to represent a running system.)

In the world of big systems of TTL logic (and I have done my share of this!) the components interact through wires which carry signals. Most often the signal is only a high or a low; open-collector and tri-state drivers are generally just a shortcut to merging drivers together. Absolutely everything is reactive, to a level that "event-driven programming" hasn't yet touched: downstream components react because of signals from upstream components. Single-bit information is often inadequate so we gather parallel bits into a "bus." In this world we are taught early the dangers of race conditions and the importance of imposing synchronous inputs where we need to reinstate order. How do we debug? By attaching probes to the wires and collecting, at high speed, selective dumps of the events that occurred in real time. (The more the components do internally, the less insight the logic state analyzer affords us.)

If software development were a bunch of cheap-and-little MPUs doing their own thing, as I think might be helpful, it would look a lot more like the TTL logic approach, where everything is asynchronously reactive except where we impose additional "clocking" to restore a synchronous nature. And I believe that debugging would indeed be probing and logging the interactions between components, not breakpoints and stepping. But what are the properties of these connections? Just as TTL defined a single logic level, with particular voltages, output drive capabilities and input current requirements, we need to figure out the specs of these interconnects. My hardware background keeps wanting to make these interconnects so simple as to be of little help in building big systems; my software background keeps wanting to make these interconnects fancy packet passing networks so lots of data gets messaged around without having to think about it. Is there some happy medium that would have enough applicability to work in a lot of different target projects? I don't know. But I'm having a lot of brow-furrowing fun thinking about it.

If you have more thoughts on the subject, please feel free to throw them my way. In any event, thank you for writing!

Mark Bereit

Alain Gronner - 6/6/2006

Mark,

Regarding your question about how are 74xx TTL components interconnected, the answer is that typically that is accomplished via conductors on a printed circuit board on which each logic IC is secured. But I'm sure you knew that already. So, I probably did not understand your question the way you meant it.

With regard to the scheme you propose, namely to effectively replace a powerful micro-controller executing a large/complex software program with a number of function-dedicated smaller micro-controllers, I find it quite seductive, but at the same time not without some significant shortcomings. Actually, in one of the last products for which I was the system engineer, this is the strategy our engineering team implemented. The product was a sophisticated telecommunication tester (a kind of protocol analyzer of sorts). It contained a total of 13 micro-controllers (mostly 80188s and one 68000). The 68000 was the conductor and managed directly the feature rich front panel. It communicated with each function-dedicated micro via a dual-ported RAM. The modularity provided by this architecture gave us a number of benefits: easily managed parallel development, ease of system troubleshooting by segmentation, each SW engineer could deal with a manageable task and the development had very few bottlenecks. So this worked out quite well. At the same time, it was fully realized at the outset that all this flexibility was quite costly in terms of overall board space requirement, power requirement, and total component cost. In other words, this was an expensive approach. Its saving grace was that the development effort (including testing) was minimized and we could only adopt this strategy because there was not a great cost sensitivity in this type of product as production volumes are (unfortunately) modest.

With the relentless drive toward smaller and smaller instruments, this strategy makes little sense. All the more so if power consumption must be severely minimized (as in battery powered applications) and/or when production volumes are very high such as in many consumer products.

With the currently available huge FPGA's it may well be possible to let a single FPGA house several optimized micros, but again all these gates and flip-flops require power to operate, not a small amount of power at that. So this is far from being a great solution.

So, while I fully subscribe to the idea that modularity and simplicity of each module are highly desirable objectives, the economics may only make sense in some specialized applications. In addition, one should not loose sight that the overall reliability of any system decreases rapidly as the number of distinct components increases. A neat, simple and effective solution is still an elusive goal, I'm afraid.

Incidentally, I believe that you are describing the event driven logic as operating on an asynchronous mode. While this type of logic implementation results in generally a stream-lined design, you often pay for this simplicity by being exposed to race conditions and ultimately in low reliability. Synchronous design typically requires more gates and/or flip-flops but the predictability and thus the reliability of a design is usually far superior.

At this point, I can only encourage you to keep your brow-furrowing efforts. Should you need to bounce some of your ideas on an fairly critical (but now retired) harware/systems engineer, you're welcome to call on me.

Good luck on your efforts.

Alain Gronner

Mark Bereit - 6/10/2006

Alain,

Sorry I wasn't clear: my question is what, in a software component world, would be the nature of those interconnects that fill the role of the wires or PCB traces in digital logic.

I agree that the lots-of-cores approach is relatively expensive in both board size and power consumption. I don't necessarily agree that this is inherently true, though. Imagine two solutions: a 200 MIPS ARM core, or twenty 10 MIPS 8051 cores, each applied to a task. With the same total MIPS we can theorize that the two approaches could accomplish the same end result. Which uses more board space? Which eats more power? I chose those specific examples because I saw a PowerPoint presentation by a Frank Vahid that pegged the FPGA implementation of these two cores at up to 40K gates and 40MHz for the 8051 core, and up to 2M gates and 200MHz for the ARM core. Looks to me like the 8051 approach needs fewer gates (800K vs 2M). Throw in some overhead for the communications links between 8051 cores, sure, but take away expense and power consumption for the lower clock speed. The more-MPU approach could be cheaper.

When I read about new high-end processors, Intel certainly included, I see how more and more transistors (or gates, if we're talking about synthesized cores like the ColdFires I use) are being committed to pipeline management, branch prediction, speculative result storage, and (ouch!) deep caches to overcome slow memory. That is, the complexity/power tradeoff curve gets worse as you scale up the single processor approach, but progresses more linearly as you scale up the cooperating processors approach.

So I suspect that most of the disadvantages to multiple processors relate to presently available (that is, market-demanded) implementations, not what is achievable.

Mark Bereit

Alain Gronner - 6/12/2006

Mark,

It's fairly easy to overlook the fact that when considering the number of transistors on a chip by far the largest number of them are used to communicate only to nearby transistors on the same chip and only very few of them are interfacing the outside world. The ones interfacing the outside world are the ones consuming a disproportionate amount of power (and require the most chip real estate). So if you have a chip housing a ton of transistors but relatively few I/O's the power consumption may be well below the same number of transistors distributed among many chips communicating via multiple parallel ports. [The reason for the large I/O xsistors is the need to drive relatively large capacitance loads.]

On top of that, one should keep in mind that not all these smaller cores/micros are necessarily self-sufficient; some of them may require dedicated memories and possibly other peripheral devices. And for good measure, one still has to address the communication issue (double ported RAM or other devices) which not only add to cost, but real estate and power requirement.

Additionally, if you consider that the program is now distributed, you have to face the complex task of modifying the software when a new version is to be installed and replace the previous one. When you have a large centralized micro, you have only one program memory to change or reload.

So, given these observations, it would appear that the deck is really stacked against the distributed micro architecture. Such an architecture would still have merit, in my opinion, when cost, space, and power requirements are not paramount while the software development effort (including testing and perhaps documentation) is to be more tightly controlled.

I, for one, like the distributed micro architecture a lot. At the same time, I'm forced to recognize that with currently available technology, I just don't see how it can overcome its inherent limitations/drawbacks for mass produced products where the margins are so slim and the competition so fierce. For applications such as motor controllers in robotics for instance, the distributed micro architecture may well be a very attractive approach.

One possible way out of this conundrum is the use of a very large FPGA containing a number of micro cores as well as a sizeable RAM plus an external (flash) program memory. At initialization, the main or coordinator micro effectively downloads to each peripheral micro the portion of the code required for its operation. In this way the entire program may be stored in a single flash memory device but the program consists of a series of short, fairly simple programs. Because most of the communication between cores takes place on chip, power requirement is minimized.

With this approach, however, it must be noted that testability and troubleshooting/debugging is still a problem because if one provides test point access, lots of I/0 pins are required.

Anyway, these are some thoughts you may want to consider in your efforts to solve a rather sticky but fascinating problem.

Alain