Archive for the ‘Software Quality’ Category

Guest Article: Static Analysis in Medical Device Software (Part 2) — Methodology

Thursday, June 4th, 2009

Pascal Cuoq at Frama-C continues his discussion of static analysis for medical device software. This is part 2 of 3. Part 1 is here.

In the second part of this article I write about methodology, where tools and engineering come together to produce software that you can entrust with lives. I do not avoid talking about the work my colleagues and I do, but I do mention the work of others too.

The layman often assumes that it must be impossible to make software that works as intended. It is a natural conclusion to draw from one's experience with personal computers, mobile phones, car on-board computers and vending machines. The layman's opinion is biased because for most people, embedded software is the means, rather than an end, and therefore is never noticed when it works. For instance, my own digital reflex camera contains a fair amount of software. Still, I have never observed it to deviate from the behavior described in the thick manual that came with it -- there are some particularities that I would call functional bugs, but since the manual describes them at length, as the old joke goes, they are features. Software that works is not impossible. It is only that, as the regretted Douglas Adams would put it, software that doesn't work is slightly cheaper. Moderately large software systems that work well enough not to be noticed can be produced. It is "only" a matter of having simple rules that enforce readability of the developed code by people who have not written it, and an appropriately sized budget for code reviews and quality assurance (usually testing, but bug-finding software analyzers are used here too, and they would be used more if their strengths were not so widely misunderstood). This statement does not include very large codebases and concurrent systems, that we still aren't very good at building reliably but keep trying anyway.

The specification for my digital camera is the thick manual, although there are also internal specifications for sub-components of the camera's software that I, as an end user, do not get to see. The internal specifications naturally tend to be more technically detailed as they deal with smaller and smaller sub-components. As components are assembled, it becomes possible to check that the corresponding specification for the sub-assembly is satisfied. This method is called the V-Model of software development, although one wonders why it needs such a high-sounding name: almost every manufactured physical object has been built from sub-components with pre-determined specifications since time immemorial.

This has nothing to do with the production of critical code. Or rather, the two components above, development according to an enforced development standard, and quality assurance (debugging), remain but become a small part of the picture in the development of critical code. Two additional components, at least as large as the first two, are the certification and the authority.

Certification is the additional, reflexive examination of the development, verification (i.e. the software conforms to specification) and validation (i.e. the specification corresponds to the actual need) processes.

One difference between software and hardware is that it is harder to make sure that software satisfies the original requirements. This was made very clear in the article that prompted this series of blog posts. And this is why critical software particularly needs certification. Certification is not so much the testing of the software against the specification (this is called "debugging" and it's not specific to critical software) but a cohesive list of arguments leading to the conclusion that whatever testing has been done was sufficient to find any possible flaw with the expected confidence. A certification file does not state "we used this development tool and we ran these 1000 tests for this component" but "we used this development tool, and here are the reasons why we think it's acceptable. Here are the reasons why we think that these 1000 tests are sufficient to ensure that this component works as expected (and, incidentally, here are the tests and their results)". As you would expect, when a static analyzer is used, the certification file does not read "Here is the tool we used and the results we obtained" but "Here is the tool we used. Here is how we established that this tool could reliably be used to ensure this aspect of the requirements, (and incidentally, here are the results we obtained)".

The authority defines the expectations for the certification, and studies the certification file once submitted. In the end, it all comes down to convincing the competent, financially disinterested humans who check the certification file that all the necessary steps have been taken to ensure the safety of the critical device.

We now arrive to the first statement I disagree with from the article, that in static analysis of software, "achieving a 100% recall rate is rare, if not impossible, and may only be possible at the cost of a very high number of false positives"

First, a 100% recall rate corresponds to the absence of false negatives, which is a perfectly achievable objective. Static analyzers with this property are called "correct" (or "sound"). These adjectives have meaning only in a context where it is clear what bugs are being looked for and what assumptions are made to this end. Assuming this context is unambiguous, they mean that as long as the tool's assumptions are respected, no bug in the analyzed program is left undetected.

Two examples of commercially available static analyzers that have been designed from the ground up to have no false negatives are PolySpace, now distributed by The MathWorks, and Astrée, soon to be distributed by AbsInt. Allow me, however, to translate the sentence "Astrée is capable of producing exactly zero false alarms" from that web page: "false alarms" mean "false positives". Astrée, by design, does not have any false negatives. If it failed to notice a possible run-time error, it would be a bug which, I am sure, would be promptly fixed. The "no false positives" claim only means that it does not have any false positives on some pre-determined representative pieces of software. It is certainly not a guarantee, since, as stated earlier, it is a mathematical impossibility for a static analyzer to reach a verdict for any analyzed program with neither false positives nor false negatives. The best way to determine the number of false positives you can expect Astrée to produce for your code is, as with any other analyzer, to try it.

Now, except in the magical world of marketing, it is indeed true that the less false negatives are allowed in the results, the more false positives can be expected to be found. This dilemma is the same that occurs every time something can only imperfectly be detected. Considering the target readership for the blog of my kind host, I do not think that I need to harp on this. But, if the medical test analogy does not work for you, consider the example of the shoestring eyelets on my shoes, which cause the metal detectors in international airports to ring almost every time (false positive) because it has become unacceptable in the last few years to have the slightest risk of a weapon going undetected through the controls (false negative).

Every system has its assumptions: in the case of the airport detector, one is that a weapon is assumed to include some metal. This is a good opportunity to introduce in passing another distinction: "safety" works against the physical world (failures, birds flying into reactors, ...). "Security" works against conscious opponents who are actively trying to use your assumptions to their advantage. This distinction can be applied to software analysis but it is more general than that. Still, even if what you are doing is categorized as "safety", if it's critical, you have to be aware of your assumptions. So the two disciplines are not always very different in philosophy, although they often aim at different objectives.

Thanks to a number of recent advances on the theoretical side, as well as the increase in the computational power available in the workstations where the analysis takes place, you can expect the number of false positives given by a correct static analyzer on your embedded code to be contained. It would be cautious to disbelieve claims that there won't be any.

In addition to the above two static analyzers, I can mention Caveat, another static analyzer without false negatives that has been developed in the laboratory where I work. Caveat is commercially available, although we do not advertise it because it is targeted to very high criticity software that does not concern many (we consider it to be most useful for code with a criticity comparable to level A, the highest in the DO-178B avionics certification standard). Since I am in a mood to take single sentences from web pages and comment on them, please allow me to do it once more: the sentence "[Using Caveat, Airbus France's] goal is to detect errors as soon as possible in the development cycle, and not to prove the software" was written at a time when Airbus France was indeed experimenting with Caveat as a R&D project. This sentence is now completely obsolete. Caveat has been officially used for part of the verification of part of the software of the Airbus A380 — that is, precisely, to establish beyond doubt certain properties about the analysed source code, and in substitution to the unit tests whose role would have been to establish these properties in a more traditional process. As the DO-178B standard mandates, Caveat has been qualified by Airbus as a verification tool to be used for the certification of this particular software.

Also from this laboratory, there is Frama-C, which is available too since it's Open Source. Frama-C is a research prototype to which the experimentation of new ideas has shifted (while Caveat is still being maintained for Airbus and any industrial user who requires it). Frama-C is more of a framework for static analyzers than a static analyzer per se. The analyzers that have been developed in Frama-C so far rely on various techniques but they are all without false negatives. Some of these analyzers are now reliable enough to be considered for R&D experimentation. Caveat was a research prototype too at the time Airbus decided to use it in production and to make it part of its certification process. Whether or not the tool you intend to use comes in a cardboard box, you will have to explain the measures you took to ensure that it was the right tool to use for what you were using it for. What it is called matters less than the measures you took.

The second statement from the article I disagree with is that "static analysis is intended to supplement and improve the effectiveness of existing best practices in testing. It should not be thought of as a substitute for device developers' current testing activities". Of course, if you are using a bug-finding static analyzer with false negatives, you will have a hard time justifying why you removed a single test from those you would have done without the analyzer. Such a tool is most useful in the debugging phase, to identify and remove bugs as quickly as possible, not in the verification phase of a process subject to certification. But when Airbus used Caveat for the A380, it was precisely in substitution to existing unit tests. The fact that Caveat is designed not have false negatives was one of the arguments in the validation of Caveat as a verification tool to establish the properties that were previously guaranteed by these unit tests, with the required confidence.

Another way to look at this question is the following: bug-finding static analyzers (that have false negatives) have the potential to be better for debugging than sound analyzers (without false negatives) because by accepting to emit false negatives, they can reduce the number of false positives (and save the user time). This debugging phase can be, and often is, lightly covered in the certification because it is later followed by verification, which is the important second check. In a certification-covered verification process, the bugs have already been ironed out and the engineers are not trying to find more bugs but to prove that there aren't any. Any positive is going to be a false positive in this context, even if it comes from the most cautious heuristic tool (a tool that makes a lot of effort to warn only when it is quite certain that a problem exists). On the other hand, during the certified verification process, a heuristic tool's contribution to the bottom line is harder to quantify, since the objective of verification is not to find bugs but to establish that there aren't any.

The statement that there aren't any bugs left when certification starts may look like an exaggeration, but it isn't. If the certification requirements are stringent, changing any part of the code (to fix a bug) means starting the verification from scratch. This is a protection against, among other things, the dangers of C that were alluded to in the first part of this article. If you find bugs at that stage, you are not doing it optimally from the economic point of view (and you are starting afresh a heavy, certification-covered verification process in which, hopefully for you, you will not discover any new bug this time).

I would like to acknowledge the careful editing of my host, the suggestions of my colleague Virgile Prevosto in writing part 1, and the remarks of both my supervisor Benjamin Monate and David Delmas (Airbus France) concerning the present part 2 of this article. The third and last part of this series will be on the topic of formal functional specifications, one of the under-used new tools that have a contribution to make in the verification of critical software. In conclusion, here is a quoted statistic in the style, if not the spirit, of Douglas Coupland's Generation X:

Number of human lives whose loss has been attributed to software failure of a civil airplane: 0

Guest Article: Static Analysis in Medical Device Software (Part 1) — The Traps of C

Friday, May 15th, 2009

Any software controlled device that is attached to a human presents unique and potentially life threatening risks.  A recent article on the use of static analysis for medical device software prompted Pascal Cuoq at Frama-C to share his thoughts on the subject. This is part 1 of 3.

The article Diagnosing Medical Device Software Defects Using Static Analysis gives an interesting overview of the applicability of static analysis to embedded medical software. I have some experience in the field of formal methods (including static analysis of programs), and absolutely none at all in the medical domain. I can see how it would be desirable to treat software involved at any stage of a medical procedure as critical, and coincidentally, producing tools for managing critical software has been my main occupation for the last five years. This blog post constitute the first part of what I have to say on the subject, and I hope someone finds it useful.

As the article states, in the development of medical software, as in many other embedded applications, C and C++ are used predominantly, for better or for worse. The "worse" part is an extensive list of subtle and less subtle pitfalls that seem to lay in each of these two languages' corner.

The most obvious perils can be avoided by restricting the programmer to a safer subset of the language -- especially if it is possible to recognize syntactically when a program has been written entirely in the desired subset. MISRA C, for instance, defines a set of rules, most of them syntactic, that help avoid the obvious mistakes in C. But only a minority of C's pitfalls can be eliminated so easily. A good sign that coding style standards are no silver bullet is that there exist so many. Any fool can invent theirs, and some have. The returns of mandating more and more coding rules diminish rapidly, to the point that overdone recommendations found in the wild contradict each other, or in the worst case, contradict common sense.

Even written according to a reasonable development standard, a program may contain bugs susceptible to result in run-time errors. Worse, such a bug may, in some executions, fail to produce any noticeable change, and in other executions crash the software. This lack of reproducibility means that a test may fail to reveal the problem, even if the problematic input vector is used.

A C program may yet hide other dangerous behaviors. The ISO 9899:1999 C standard, the bible for C compilers implementers and C analyzers implementers alike, distinguishes "undefined", "unspecified", and "implementation-defined" behaviors. Undefined behaviors correspond roughly to the run-time errors mentioned above. The program may do anything if one of these occurs, because it is not defined by the standard what it should do. A single undefined construct may cause the rest of the program to behave erratically in apparently unrelated ways. Proverbially, a standard-compliant compiler may generate a program that causes the device to catch fire when a division by zero happens.

Implementation-defined behaviors represent choices that are not imposed by the standard, but that have to be made by the compiler once and for all. In embedded software, it is not a viable goal to avoid implementation-defined constructions: the software needs to use them to interface with the hardware. Additionally, size and speed constraints for embedded code often force the developer to use implementation-defined constructs even where standard constructs exist to do the same thing.

However, in the development of critical systems, the underlying architecture and compiler are known before software development starts. Some static analysis techniques lend themselves well to this kind of parameterization, and many available tools that provide advanced static analysis can be configured for the commonly available embedded processors and compilers. Provided that the tests are made with the same compiler and hardware as the final device, the existence of implementation-defined behaviors does not invalidate testing as a quality assurance method, either.

Unspecified behaviors are not treated as seriously as they should by many static analysis tools. That's because unlike undefined behaviors, they cannot set the device on fire. Still, they can cause different results from one compilation to the other, from one execution to the other, or even, when they occur inside a loop, from one iteration to the other. Like the trickiest of run-time errors, they lessen the value of tests because they are not guaranteed to be reproducible.

The "uninitialized variable" example in the list of undesirable behaviors in the article is in fact an example of unspecified behavior. In the following program, the local variable L has a value, it is only unknown which one.

Computing (L-L) in this example reliably give a result of zero.

Note: For the sake of brevity, people who work in static analysis have a tendency to reduce their examples to the shortest piece of code that exhibits the problem. In fact, in writing this blog post I realized I could write an entire other blog post on the deformation of language in practitioners of static analysis. Coming back to the subject at hand, of course, no programmer wants to compute zero by subtracting an uninitialized variable from itself. But a cryptographic random generator might for instance initialize its seed variable by mixing external random data with the uninitialized value, getting at least as much entropy as provided by the external source but perhaps more. The (L-L) example should be considered as representing this example and all other useful uses of uninitialized values.

Knowledge of the compilation process and lower-level considerations may be necessary in order to reliably predict what happens when uninitialized variables are used. If the local variable L was declared of type float, the actual bit sequence found in it at run-time could happen to represent IEEE 754's NaN or one of the infinities, in which case the result of (L-L) would be NaN.

Uninitialized variables, and more generally unspecified behaviors, are indeed less harmful than undefined behaviors. Some "good" uses for them are encountered from time to time. We argue that critical software should not exhibit any unspecified behavior at all. Uses of uninitialized variables can be excluded by a simple syntactic rule "all local variables should be initialized at declaration", or, if material constraints on the embedded code mean this price is too high to pay, with one of the numerous static analyzers that reliably detect any use of an uninitialized variable. Note that because of C's predominant use of pointers, it may be harder than it superficially appears to determine if a variable is actually used before being initialized or not; and this is even in ordinary programs.

There are other examples of unspecified behaviors not listed in the article, such as the comparison of addresses that are not inside a same aggregate object, or the comparison of an invalid address to NULL. I am in fact still omitting details here. See the carefully worded §6.5.8 in the standard for the actual conditions.

An example of the latter unspecified behavior is (p == NULL) where p contains an invalid address computed as t+12345678 (t being a char array with only 10000000 cells). This comparison may produce 1 when t happens to have been located at a specific address by the compiler, typically UINT_MAX-12345677. It also produces 0 in all other cases. If there is an erroneous behavior that manifests itself only when this condition produces 1, a battery of tests is very unlikely to uncover the bug, which may remain hidden until after the device has been widely deployed.

An example of comparison of addresses that are not in the same aggregate object is the comparison (p <= q), when p and q are pointers to memory blocks that have both been obtained by separate calls to the allocation function malloc. Again, the result of the comparison depends on uncontrolled factors. Assume such a condition made its way by accident in a critical function. The function may have been unit-tested exhaustively, but the unit tests may not have taken into account the previous sequence of bloc allocations and deallocations that results in one block being positioned before or after the other in the heap. A typical static analysis tool is smarter, and may consider both possibilities for the result of the condition, but we argue that in critical software, the fact that the result is unspecified should in itself be reported as an error.

Another failure mode for programs written in C or any other algorithmic language is the infinite loop. In embedded software, one is usually interested in an even stronger property than the absence of infinite loops, the verification of a predetermined bound on the execution time of a task. Detection of infinite loops is a famous example of undecidable problem. Undecidable problems are problems for which it is mathematically impossible to provide an algorithm that for any input (here, a program to analyze) eventually answers "yes" or "no". People moderately familiar with undecidability sometimes assume this means it is impossible to make a static analyzer that provides useful information on the termination of an analyzed program, but the theoretical limitation can be worked around by accepting a little imprecision (false negatives, or false positives, or both, in the diagnostic), or by allowing that the analyzer itself will, in some cases, not terminate.

The same people who recognize termination of the analyzed program as an undecidable property for which theory states that a perfect analyzer cannot be made, usually fail to recognize that finely recognizing run-time errors or unspecified behaviors are undecidable problems as well. For these questions, it is also mathematically impossible to build an analyzer that always terminates and emits neither false positives nor false negatives.

Computing the worse-case execution time is at least as hard as verifying termination, therefore it's undecidable too. That's for theory. In practice, there exist useful static analyzers that provide guaranteed worse case execution times for the execution of a piece of software. They achieve this by limiting the scope of the analysis, firstly, to the style of code that is common in embedded software, and secondly, to the one sub-task whose timing is important. This kind of analysis cannot be achieved using the source code alone. The existing analyzers all use the binary code of the task at some point, possibly in addition to the source code, a sample of the processor to be used in the device, or only an abstract description of the processor.

This was part one of the article, where I tried to provide a list of issues to look for in embedded software. In part two, I plan to talk about methodology. In part three, I will introduce formal specifications, and show what they can contribute to the issue of software verification.

Software Verification vs. Validation

Thursday, March 26th, 2009

For some reason it just really bugs me that these two terms are incorrectly interchanged so frequently.

Part of the problem is that the document General Principles of Software Validation; Final Guidance for Industry and FDA Staff (2002) does not do a good job of differentiating actual verification and validation activities. They just call everything validation.

The recent MD&DI article Building Quality into Medical Device Software provides a pretty good overview of the these regulatory requirements, but is a another case in point.  The article talks about "software validation" at every step just like the FDA document.

Another similar article on this subject is Software Validation: Turning Concepts into Business Benefits.  It is also confused. e.g. (my highlight):

... software validation involves the execution of tests designed to cover each of the specific system requirements.

No, testing specific requirements is a verification activity! It's no wonder most people are confused.

These definitions, Difference between Verification and Validation, are better as they highlight the sequencing of activities:

Verification takes place before validation, and not vice versa. Verification evaluates documents, plans, code, requirements, and specifications. Validation, on the other hand, evaluates the product itself.

From here (warning PDF):
verification vs. validation

Validation activities (usability testing, user feedback, etc.) are much harder to define, execute, and document properly than most verification testing.

Here are the golden rules:

Verificationwas the product built right?

Validation: was the right product built?

I guess I should get over it...

UPDATE (5/12/09):  Good definitions from here: Diagnosing Medical Device Software Defects Using Static Analysis:

Verification and validation are terms that are often used in software. However, it is important to understand the difference between these two distinct but complementary activities. Software verification provides objective evidence that the design outputs of a particular phase of the software development life cycle meet all of the specified requirements for that phase by checking for consistency, completeness, and correctness of the software and its supporting documentation. Validation, on the other hand, is the confirmation by examination and provision of objective evidence that software specifications conform to user needs and intended uses, and that the particular requirements implemented through software can be consistently fulfilled.

UPDATE (8/6/09):  The importance of proper V&V can not be overstated. The FDA is watching: FDA still enforcing regulations for validation of enterprise software.

UPDATE (2/11/10): I just noticed that the guidance document link on the FDA site was changed and fixed it. When I reviewed the document I found that even though it was "issued" in Jan. 2002 it had been recently updated (11/6/09). The later sections (4, 5, and 6) still use the term validation generically, but the updated document does distinguish between verification and validation:

3.1.2 Verification and Validation

The Quality System regulation is harmonized with ISO 8402:1994, which treats "verification" and "validation" as separate and distinct terms. On the other hand, many software engineering journal articles and textbooks use the terms "verification" and "validation" interchangeably, or in some cases refer to software "verification, validation, and testing (VV&T)" as if it is a single concept, with no distinction among the three terms.

Software verification provides objective evidence that the design outputs of a particular phase of the software development life cycle meet all of the specified requirements for that phase. Software verification looks for consistency, completeness, and correctness of the software and its supporting documentation, as it is being developed, and provides support for a subsequent conclusion that software is validated. Software testing is one of many verification activities intended to confirm that software development output meets its input requirements. Other verification activities include various static and dynamic analyses, code and document inspections, walkthroughs, and other techniques.

Software validation is a part of the design validation for a finished device, but is not separately defined in the Quality System regulation. For purposes of this guidance, FDA considers software validation to be "confirmation by examination and provision of objective evidence that software specifications conform to user needs and intended uses, and that the particular requirements implemented through software can be consistently fulfilled." In practice, software validation activities may occur both during, as well as at the end of the software development life cycle to ensure that all requirements have been fulfilled.

More Software Forensics and Why Analogies Suck

Tuesday, July 1st, 2008

There's a recent article in the Baltimore Sun called Flaws in medical coding can kill which just rehashes static software  analysis (hat tip: FDA Trying to Crack Down on Software Errors).

I've discussed software forensics tools before. Yes, bad software has hurt and killed people, and there's no excuse for it.  I still don't think an expensive automated software tool is the silver bullet (which is implied by the article) for solving these problems.

But here's what really bugs me:

"If architects worked this way, they'd only be able to find flaws by building a building and then watching it fall down"

This is a prime example of why analogies suck.  The quote is supposed to somehow bolster the FDA's adoption of "new forensic technology". If you stop and think about it, it does just the opposite.

I guess you first have to consider the source --  a VP of Engineering for a forensic software vendor. This is exactly what a you'd expect to hear in a sales pitch.

What's truly ironic though is that a static analysis tool can only be used on source code! Think about it. Source code is the finished product of the software design and development process. Also, forensic science, by definition is the presentation of something that has already happened. It can only be done after the fact.

The logical conclusion you would draw from the analogy is that static analysis is probably useless because the building is already up!  If you step back and look at the full software quality process, this may well be true.

I'm not saying that static analysis tools don't have value. Like all of the other software tools we use, they have their place.

Just beware when you try to use an analogy to make a point.

UPDATE (7/5/08):

Here's another take on medical device bugs: When bugs really do matter: 22 years after the Therac 25.

UPDATE (7/16/08):
From Be Prepared: Software Forensics Gaining Steam at FDA, David Vogel of ­Intertech Engineering Associates says:

... that static tools are hyped to do more than they can actually deliver. “Static analysis looks for simple coding errors and does not apply heuristics to understand how it will perform dynamically because it is a static analysis tool”

I agree.

UPDATE (7/26/08):

Another reference : Are hospitals really safe?

UPDATE (9/16/08):

A couple more related articles:

Applying Static Analysis To Medical Device Software

Using static analysis to diagnose & prevent failures in safety-critical device designs

UPDATE (9/27/08):

Architecting Buildings and Software: Software architects are an important component in the creation of quality software and need to continue to refine and improve their role in the development process.  No matter how you try to bend and twist it though, the building analogy will always be problematic -- so why bother? Maybe that "intuitive understanding" of the construction industry just distracts us from being innovative about what's required to build software.

UPDATE (12/1/08): If Jeff wasn't a programmer he'd be a farmer: Tending Your Software Garden

Connecting Computers to FDA Regulated Medical Devices

Wednesday, June 18th, 2008

Pete Gordon asked a couple of questions regarding FDA regulations for Internet-based reporting software that interface with medical devices. The questions are essentially:

  1. How much documentation (SRS, SDS, Test Plan) is required and at what stage can you provide the documentation?
  2. How does the FDA view SaaS architectures?

The type of software you're talking about has no real FDA regulatory over site. The FDA has recently proposed new rules for connectivity software. I've commented on the MDDS rules, but Tim has a complete overview here: FDA Issues New MDDS Rule. As Tim notes, if the FDA puts the MDDS rules into place and becomes more aggressive about regulation, many software vendors that provide medical devices interfaces will be required to submit 510(k) premarket approvals.

Dealing with the safety and effectiveness of medical devices in complex networked environments is on the horizon. IEC 80001 (and here) is a proposed process for applying risk management to enterprise networks incorporating medical devices.  My mantra: High quality software and well tested systems will always be the best way to mitigate risk.

Until something changes, the answer to question #1 is that if your software is not a medical device, you don't need to even deal with the FDA. The answer to question #2 is the same. The FDA doesn't know anything about SaaS architectures unless it's submitted as part of a medical device 510(k).

I thought I'd take a more detailed look at the architecture we're talking about so we can explore some of the issues that need to be addressed when implementing this type of functionality.

mdds2.jpg

This is a simplified view of the way medical devices typically interface to the outside world. The Communications Server transmits and receives data from one or more medical devices via a proprietary protocol over whatever media the device supports (e.g. TCP/IP, USB, RS-232, etc.).

In addition to having local storage for test data, the server could pass data directly to an EMR system via HL7 or provide reporting services via HTTP to a Web client.

There are many other useful functions that external software systems can provide. By definition though, a MDDS does not do any real-time patient monitoring or alarm generation.

Now let's look at what needs to be controlled and verified under these circumstances.

  1. Communications interaction with proper medical device operation.
  2. Device communications protocol and security.
  3. Server database storage and retrieval.
  4. Server security and user authentication.
  5. Client/server protocol and security.
  6. Client data transformation and presentation to the user (including printed reports).
  7. Data export to others formats (XML, CSV, etc.).
  8. Client HIPAA requirements.

Not only is the list long, but these systems involve the combination of custom written software (in multiple languages), multiple operating systems, configurable off-the-shelf software applications, and integrated commercial and open source libraries and frameworks. Also, all testing tools (hardware and software) must be fully validated.

One of the more daunting verification tasks is identifying all of the possible paths that data can take as it flows from one system to the next. Once identified, each path must be tested for data accuracy and integrity as it's reformatted for different purposes, communications reliability, and security. Even a modest one-way store-and-forward system can end up with a hundred or more unique data paths.

A full set of requirements, specifications, and verification and validation test plans and procedures would need to be in place and fully executed for all of this functionality in order to satisfy the FDA Class II GMP requirements. This means that all of the software and systems must be complete and under revision control. There is no "implementation independent" scenario that will meet the GMP requirements.

It's no wonder that most MDDS vendors (like EMR companies) don't want to have to deal with this. Even for companies that already have good software quality practices in place, raising the bar up to meet FDA quality compliance standards would still be a significant organizational commitment and investment.

The Benefits of Software Validation

Monday, March 24th, 2008

Many people still confuse verification (was the product built right?) and validation (was the right product built?). The benefits of both of these activities are well covered in Software Validation: Turning Concepts into Business Benefits:

Potential benefits of software validation and verification.

Software V&V is a FDA requirement, but the same methodologies can be used to improve non-device software as well.

Software Development: Driven by what?

Sunday, February 17th, 2008

First a definition:

driv·en –adjective

  1. : having a compulsive or urgent quality
  2. : propelled or motivated by something — used in combination <results-driven>

Driven software development methodologies abound:

Many of these are encompassed by the iterative Agile software development methodologies. Collectively they are sometimes referred to as the XDD acronyms. As you might expect, these along with all of the other competing, contrasting, and overlapping development philosophies can cause a software developer much consternation. Confessions of a Software Developer* is a good example of the overload that can occur.

My reason for bringing up driven methodologies is not to complain about being overwhelmed by them (which, like most others, I am). It's simply to point out the contradiction of X-Driven with the Merriam-Webster definition. I think this will help us better understand what should really be driving us.

Look closely at definition #2. Propelled or motivated by something ... results-driven. What is that something? Ah ha!

The fundamental motivation for all of these development approaches is to:

Improve productivity and quality.

This is the result, the goal. Behavior, Model, Test, etc. are all just the means by which we are trying to achieve this desired result. It's the result that we're driven towards, not the methods and techniques we use to get there.

So, in order to make this distinction clear and to eliminate confusion in the future, I propose that all these methodologies be renamed from Driven to Guided. Think of them like you would a GPS system in your car, except these will allow you to find software Nirvana. TDD is now TGD, and the whole lot is known as XGD.

The point here is that you should not let any particular development philosophy blind you to what the real purpose of using it is in the first place. Being guided by a methodology helps me remember that better than when I'm driven by it. Also, the whole concept of being driven seems exclusionary to me. You shouldn't hesitate use the pieces and parts of any combination of these techniques that best suites your needs.

Understanding Software Defects

Sunday, November 25th, 2007

We tend to focus a lot of attention on tools and methodologies for improving software quality. I thought it would be worth while taking a step back to try to understand what the root causes of software defects are. Fortunately there has been decades of research that have analyzed the most common sources of software defects.

After also looking at some related development sins, I'll summarize what this new understanding means to me as a software developer.

An often sited article in IEEE Computer is Software Defect Reduction Top-10 List (Vol. 34, Issue 1, January 2001, 135-137) . Here's a summary (from Software Engineering: Barry W. Boehm's Lifetime Contributions to Software Development, Management, and Research):

  1. Developers take 100 times less effort to find and fix a problem than one reported by a customer.
  2. Half of software project work is wasted on unnecessary rework.
  3. Twenty percent of the defects account for 80% of the rework.
  4. Twenty percent of modules account for 80% of the defects and half the modules have no defects.
  5. Ninety percent of the downtime comes from 10% of the defects.
  6. Peer reviews catch 60% of the defects.
  7. Directed reviews are 35% more effective than nondirected ones.
  8. Discipline can reduce defects by 75%.
  9. High-dependability modules cost twice as much to produce as low-dependability ones.
  10. Half of all user programs contain nontrivial defects.

This list is based on empirical research and is a good starting point for understanding how to avoid predictable pitfalls in the software development process.

A broader perspective is provided by Pursue Better Software, Not Absolution for Defective Products -- Avoiding the "Four Deadly Sins of Software Development" Is a Good Start. Here are the four deadly sins:

The First Deadly Sin: Sloth -- Who Needs Discipline?

The Second Deadly Sin: Complacency -- The World Will Cooperate with My Expectations.

The Third Deadly Sin: Meagerness -- Who Needs an Architecture?

The Fourth Deadly Sin: Ignorance -- What I Don’t Know Doesn’t Matter.

The SEI article concludes:

We believe that the practice of software engineering is sufficiently mature to enable the routine production of near-zero-defect software.

🙂 How can you not smile (or even LOL) at that statement? Despite that, I like the reduction of the problem into its most basic elements: human shortcomings. That's why the conclusion is so preposterous -- software development is a human activity, and a complex one at that. You're trying to produce a high quality software solution that meets customer expectations, which is a difficult thing to do.

Another list of software development sins can be found in The 7 Deadly Sins of Software Development.
#1 - Overengineering (in complexity and/or performance)
#2 - Not considering the code's readership
#3 - Assuming your code works
#4 - Using the wrong tool for the job
#5 - Excessive code pride
#6 - Failing to acknowledge weaknesses
#7 - Speaking with an accent (naming conventions)

There are some tool/language specific items here, but this list generally follows the same trend of discovering typical developer shortcomings that can be avoided.

Another source of software defects is poor project planning. More sins (deadly again) can be found in the Steve McConnell article: Nine Deadly Sins of Project Planning.

It's pretty easy to see from these categorizations where a lot of the software development and management techniques, tools, and practices came from. As you might have expected, many are focused on human behavior and communication as a key component for improving software quality. For example, take the Agile Manifesto:

Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan

This vision is very telling about what the manifesto writers considered to be a primary cause of software defects.

Another perspective is Fred Brooks' famous 1986 'No Silver Bullet' paper (also see here) that distinguishes "accidental" repetitive tasks from "essential" tasks. From the article:

There is no single development, in either technology or in management technique, that by itself promises even one order-of-magnitude improvement in productivity, in reliability, in simplicity.

Even after twenty years of significant progress in software engineering, I believe that this is still a true statement.

Conclusion:

There are many complex factors that contribute to software defects. There is clearly no one-size-fits-all solution. As a developer, this means that I have to:

  1. Be aware of my own shortcomings and biases.
  2. Continually try to improve my development, communication, and management skills.
  3. Understand how each new tool or methodology that claims to improve software quality fits into the bigger picture of what I (both personally and as an organization) am trying to accomplish.

Ever heard of FRACAS?

Saturday, October 27th, 2007

FMEA (Failure Mode and Effects Analysis) is a regular part of our development process (we call it "Hazard Analysis"), but I was unfamiliar with FRACAS (Failure Reporting, Analysis, and Corrective Action Systems) until I ran across this: 10/21/2007 FRACAS? – Never heard of it.

I'd also never heard of "software reliability growth". The model is described here, and there are also some other good links available from the Google search.

I'm a software developer, not a quality systems person. According to Jan, even medical device quality people are not that familiar with these methods. In addition to Jan's explanation for this, one reason I (or others) might not have heard of these is that my experience has been exclusively in Class II non-invasive diagnostic devices. The development of Class III life support and implantable devices is a whole different animal when in comes to quality control rigor.

CAPA's (Corrective and Preventive Action) are of course part of the standard FDA Quality System Regulations (§ 820.100). These procedures are primarily implemented to deal with quality problems after the product has been released to the field.

The concept of preventing recurrence of observed errors (FRACAS) during the development process is certainly an interesting one. The difference between CAPA and FRACAS is similar to the argument I made regarding Software Forensics -- these techniques should be used to ensure quality before the product is released.

Medical Device Software Forensics

Wednesday, October 17th, 2007

"The Gray Sheet" has an article called CDRH Software Forensics Lab: Applying Rocket Science To Device Analysis. Can it really be true that CDRH is doing static code analysis for detecting software defects in recall investigations?

Static code analysis has been around for a long time. I remember using Lint back in the old days. Nowadays all commonly used computer languages (like C and C++) have fairly advanced analysis tools. For Microsoft .NET languages there are tools like FxCop and ReShaper. These tools are great for spotting trouble areas and maintaining code conventions during the development process.

I just have a hard time imagining a process using these tools that would successfully detect real defects in complex medical device software. Especially software that's controlling hardware devices, doing communications tasks, and recording sensor data. Also, what about all of the assembly language code for embedded processors and DSPs?

Anyway, from the article:

A recent review in the forensics lab found 180 "questionable constructs" in 100,000 lines of code, but only two turned out to be real design issues, Taylor said.

He also pointed to two other cases where static analysis of the software did not find any bugs, thus clearing software as a root cause candidate in the recall investigations.

These statements beg the following questions:

  1. Were the two "real design issues" related at all to the device failures?
  2. Just because no static analysis "bugs" were found, why does that exonerate the software from being the cause of the failure?

I'm not impressed that static analysis has "been embraced by the aeronautical and automotive sectors". IMHO this approach just seems like it would create a lot of work for very little return. The manufacturer that produced the device should be responsible for tracking down and fixing the software (and/or hardware) defects.

I'll stop now.

UPDATE (25-Oct-07):

Check out Tim's post on this subject: FDA Raises Bar on Medical Device Software Testing.