Improving EEG Analysis

neuro_brain1The Technology Review article Better Brain-Wave Analysis looks at start-up ElMindA that is trying to find new quantitative methods for broadening the clinical use of EEG.

The company has developed a novel system that calculates a number of different parameters from EEG data, such as the frequency and amplitude of electrical activity in particular brain areas, the origin of specific signals, and the synchronicity in activity in two different brain areas as patients perform specific tests on a computer.

This description doesn't sound very novel, but I've always felt that EEG analysis has tremendous clinical potential. This is particularly true for rehabilitation purposes (like the stroke example) and EEG-based communications for paralysis patients.

I am skeptical of  "objective diagnosis" claims for things like attention-deficit hyperactivity disorder (ADHD) though.  In the 1980's EEG Topography was thought to be able to distinguish some psychiatric disorders. These claims were never proven to be true.

I'm not saying that new quantitative techniques like those being developed by ElMindA are comparible to the old EEG "brain mapping", but significant clinical validation will be required before they can be used clinically.

UPDATE (6/3/09):  Another related article (also from Technology Review) Reading the Surface of the Brain (and cool picture):

laser_neuro_x220

UPDATE (6/10/09): The BrainGate project at Massachusetts General Hospital has recently started clinical trials that may help paralyzed patients. The on-going MGH project:

... the ultimate goals of which include "turning thought into action": developing point and click capabilities on a computer screen, controlling a prosthetic limb and a robotic arm, controlling functional electrical stimulation (FES) of nerves disconnected from the brain due to paralysis, and further expanding the neuroscience underlying the field of intracortical neurotechnology.

UPDATE (7/5/09): This is more related to Brain Control Headsets, but if you're interested in developing your own EEG-based controller you should check out An SDK for your brain. The free NeuroSky MindSet Development Tools along with a $200 headset will get you started developing your own "mind-controlled" game. Good luck with that!

Posted in EEG | Tagged , , , , | 1 Comment

Guest Article: Static Analysis in Medical Device Software (Part 1) — The Traps of C

Any software controlled device that is attached to a human presents unique and potentially life threatening risks.  A recent article on the use of static analysis for medical device software prompted Pascal Cuoq at Frama-C to share his thoughts on the subject. This is part 1 of 3.

The article Diagnosing Medical Device Software Defects Using Static Analysis gives an interesting overview of the applicability of static analysis to embedded medical software. I have some experience in the field of formal methods (including static analysis of programs), and absolutely none at all in the medical domain. I can see how it would be desirable to treat software involved at any stage of a medical procedure as critical, and coincidentally, producing tools for managing critical software has been my main occupation for the last five years. This blog post constitute the first part of what I have to say on the subject, and I hope someone finds it useful.

As the article states, in the development of medical software, as in many other embedded applications, C and C++ are used predominantly, for better or for worse. The "worse" part is an extensive list of subtle and less subtle pitfalls that seem to lay in each of these two languages' corner.

The most obvious perils can be avoided by restricting the programmer to a safer subset of the language -- especially if it is possible to recognize syntactically when a program has been written entirely in the desired subset. MISRA C, for instance, defines a set of rules, most of them syntactic, that help avoid the obvious mistakes in C. But only a minority of C's pitfalls can be eliminated so easily. A good sign that coding style standards are no silver bullet is that there exist so many. Any fool can invent theirs, and some have. The returns of mandating more and more coding rules diminish rapidly, to the point that overdone recommendations found in the wild contradict each other, or in the worst case, contradict common sense.

Even written according to a reasonable development standard, a program may contain bugs susceptible to result in run-time errors. Worse, such a bug may, in some executions, fail to produce any noticeable change, and in other executions crash the software. This lack of reproducibility means that a test may fail to reveal the problem, even if the problematic input vector is used.

A C program may yet hide other dangerous behaviors. The ISO 9899:1999 C standard, the bible for C compilers implementers and C analyzers implementers alike, distinguishes "undefined", "unspecified", and "implementation-defined" behaviors. Undefined behaviors correspond roughly to the run-time errors mentioned above. The program may do anything if one of these occurs, because it is not defined by the standard what it should do. A single undefined construct may cause the rest of the program to behave erratically in apparently unrelated ways. Proverbially, a standard-compliant compiler may generate a program that causes the device to catch fire when a division by zero happens.

Implementation-defined behaviors represent choices that are not imposed by the standard, but that have to be made by the compiler once and for all. In embedded software, it is not a viable goal to avoid implementation-defined constructions: the software needs to use them to interface with the hardware. Additionally, size and speed constraints for embedded code often force the developer to use implementation-defined constructs even where standard constructs exist to do the same thing.

However, in the development of critical systems, the underlying architecture and compiler are known before software development starts. Some static analysis techniques lend themselves well to this kind of parameterization, and many available tools that provide advanced static analysis can be configured for the commonly available embedded processors and compilers. Provided that the tests are made with the same compiler and hardware as the final device, the existence of implementation-defined behaviors does not invalidate testing as a quality assurance method, either.

Unspecified behaviors are not treated as seriously as they should by many static analysis tools. That's because unlike undefined behaviors, they cannot set the device on fire. Still, they can cause different results from one compilation to the other, from one execution to the other, or even, when they occur inside a loop, from one iteration to the other. Like the trickiest of run-time errors, they lessen the value of tests because they are not guaranteed to be reproducible.

The "uninitialized variable" example in the list of undesirable behaviors in the article is in fact an example of unspecified behavior. In the following program, the local variable L has a value, it is only unknown which one.

Computing (L-L) in this example reliably give a result of zero.

Note: For the sake of brevity, people who work in static analysis have a tendency to reduce their examples to the shortest piece of code that exhibits the problem. In fact, in writing this blog post I realized I could write an entire other blog post on the deformation of language in practitioners of static analysis. Coming back to the subject at hand, of course, no programmer wants to compute zero by subtracting an uninitialized variable from itself. But a cryptographic random generator might for instance initialize its seed variable by mixing external random data with the uninitialized value, getting at least as much entropy as provided by the external source but perhaps more. The (L-L) example should be considered as representing this example and all other useful uses of uninitialized values.

Knowledge of the compilation process and lower-level considerations may be necessary in order to reliably predict what happens when uninitialized variables are used. If the local variable L was declared of type float, the actual bit sequence found in it at run-time could happen to represent IEEE 754's NaN or one of the infinities, in which case the result of (L-L) would be NaN.

Uninitialized variables, and more generally unspecified behaviors, are indeed less harmful than undefined behaviors. Some "good" uses for them are encountered from time to time. We argue that critical software should not exhibit any unspecified behavior at all. Uses of uninitialized variables can be excluded by a simple syntactic rule "all local variables should be initialized at declaration", or, if material constraints on the embedded code mean this price is too high to pay, with one of the numerous static analyzers that reliably detect any use of an uninitialized variable. Note that because of C's predominant use of pointers, it may be harder than it superficially appears to determine if a variable is actually used before being initialized or not; and this is even in ordinary programs.

There are other examples of unspecified behaviors not listed in the article, such as the comparison of addresses that are not inside a same aggregate object, or the comparison of an invalid address to NULL. I am in fact still omitting details here. See the carefully worded §6.5.8 in the standard for the actual conditions.

An example of the latter unspecified behavior is (p == NULL) where p contains an invalid address computed as t+12345678 (t being a char array with only 10000000 cells). This comparison may produce 1 when t happens to have been located at a specific address by the compiler, typically UINT_MAX-12345677. It also produces 0 in all other cases. If there is an erroneous behavior that manifests itself only when this condition produces 1, a battery of tests is very unlikely to uncover the bug, which may remain hidden until after the device has been widely deployed.

An example of comparison of addresses that are not in the same aggregate object is the comparison (p <= q), when p and q are pointers to memory blocks that have both been obtained by separate calls to the allocation function malloc. Again, the result of the comparison depends on uncontrolled factors. Assume such a condition made its way by accident in a critical function. The function may have been unit-tested exhaustively, but the unit tests may not have taken into account the previous sequence of bloc allocations and deallocations that results in one block being positioned before or after the other in the heap. A typical static analysis tool is smarter, and may consider both possibilities for the result of the condition, but we argue that in critical software, the fact that the result is unspecified should in itself be reported as an error.

Another failure mode for programs written in C or any other algorithmic language is the infinite loop. In embedded software, one is usually interested in an even stronger property than the absence of infinite loops, the verification of a predetermined bound on the execution time of a task. Detection of infinite loops is a famous example of undecidable problem. Undecidable problems are problems for which it is mathematically impossible to provide an algorithm that for any input (here, a program to analyze) eventually answers "yes" or "no". People moderately familiar with undecidability sometimes assume this means it is impossible to make a static analyzer that provides useful information on the termination of an analyzed program, but the theoretical limitation can be worked around by accepting a little imprecision (false negatives, or false positives, or both, in the diagnostic), or by allowing that the analyzer itself will, in some cases, not terminate.

The same people who recognize termination of the analyzed program as an undecidable property for which theory states that a perfect analyzer cannot be made, usually fail to recognize that finely recognizing run-time errors or unspecified behaviors are undecidable problems as well. For these questions, it is also mathematically impossible to build an analyzer that always terminates and emits neither false positives nor false negatives.

Computing the worse-case execution time is at least as hard as verifying termination, therefore it's undecidable too. That's for theory. In practice, there exist useful static analyzers that provide guaranteed worse case execution times for the execution of a piece of software. They achieve this by limiting the scope of the analysis, firstly, to the style of code that is common in embedded software, and secondly, to the one sub-task whose timing is important. This kind of analysis cannot be achieved using the source code alone. The existing analyzers all use the binary code of the task at some point, possibly in addition to the source code, a sample of the processor to be used in the device, or only an abstract description of the processor.

This was part one of the article, where I tried to provide a list of issues to look for in embedded software. In part two, I plan to talk about methodology. In part three, I will introduce formal specifications, and show what they can contribute to the issue of software verification.

Posted in Programming, Software Quality | Tagged , , | 4 Comments

Continuous Learning: 14 Ways to Stay at the Top of Your Profession

"Professional development refers to skills and knowledge attained for both personal development and career advancement. " I'm fortunate in that my personal and career interests are well aligned. I must enjoy my work because I do a lot of the same activities with a majority of my free time (just ask my wife!).

Keeping up with an industry's current technologies and trends is a daunting task.  Karl Seguin's post Part of your job should be to learn got me to thinking about the things I do to stay on top of my interests.  I never really thought about it much before, but as I started making a list I was surprised by how fast it grew.  When it reached a critical mass that I thought it would be worth sharing.

I actually have two professions. I'm a Biomedical Engineer (formal training) and a Software Engineer (self proclaimed).  I primarily do software design and development, but being in the medical device industry also requires that I keep abreast of regulatory happenings (the FDA in particular, HIPAA, etc.), quality system issues,  and industry standards (e.g. HL7).

Keeping track of Healthcare IT trends is also a big task. With the new emphasis by the federal government on EMR adoption, even a small company like mine has started planning and investing in the future demand for medical device integration.

The other major topic of interest to me is software design and development methodologies. A lot of the good work in this area seems to come from people that are involved in building enterprise class systems. I've discussed the ALT.NET community (here) and still think they are worth following.

So here's my list.  I talk about them with respect to my interests (mostly software technologies), but I think they are generally applicable to any profession.

1. Skunk Works

Getting permission from your manager to investigate new technologies that could potentially be used by your company is win-win. In particular, if you can parlay your new-found skills into a product that makes money (for the company, of course), then it's WIN-WIN.

In case you've never heard this phrase:  Skunk works.

2. Personal Projects

I always seem to be working with a new software development tool or trying to learn a new programming language. Even if you don't become an expert at them, I think hands-on exposure to other technologies and techniques is invaluable. It gives you new perspectives on the things that you are an expert in.

Besides getting involved in an open source project, people have many interesting hobby projects.  See Do you have a hobby development project? for some examples.

3. Reading Blogs

I currently follow about 40 feeds on a variety of topics. I try to remove 2-3  feeds and replace them with new ones at least once a month. Here is my Google Reader trend for the last 30 days:

30 day RSS trendYou can see I'm pretty consistent. That's 1605 posts in 30 days, or about 53 posts per day. To some, this may seem like a lot. To others, I'm a wimp.  During the week I usually read them over lunch or in the evening.

4. Google Alerts

Google Alerts is a good way to keep track of topics and companies of interest. You get e-mail updates with news and blog entries that match any search term. For general search terms use 'once a day' and for companies use 'as-it-happens'.

5. Social Networks

I joined Twitter over a month ago.  The 30 or so people I follow seem to have the same interests as I do. What's more important is that they point me to topics and reference sites that I would not have discovered otherwise. I've dropped a few people that were overly verbose or had mostly inane (like  "I'm going to walk the dog now.") tweets.

I'm also a member of LinkedIn. Besides connecting with people you know there are numerous groups you can join and track topical discussions. Unfortunately, there are quite a few recruiters on LinkedIn which somewhat diminishes the experience for me.

I don't have a Facebook account because my kids told me you have to be under 30 to join. Is that true? 🙂

6. Books

I browse the computer section of the bookstore on a regular basis.  I even buy a technical book every now and then.

Downloading free Kindle e-books is another good source (and free, of course) e.g. here are a couple though Karl's post: Foundations of Programming. There's a lot of on-line technical reading material around. Having a variety on the Kindle allows me to read them whenever the mood strikes me.  One caution though: the Amazon conversion from PDF and HTML to e-book format is usually not very good. This is particularly true for images and code. But still, it's free -- you get what you pay for.

7. Magazines

There are numerous technical print publications around, but they are becoming rare because of the ease of on-line alternatives.  I used to get Dr. Dobbs journal but they no longer publish a print version, but it is still available electronically though.

I miss that great feeling of cracking open a fresh nerd magazine.  I still remember the pre-Internet days when I had stacks of BYTE laying around the house.

8. Webinars

These tend to be company sponsored, but the content about a product or service that you may not know a lot about is a good way to learn a new subject.  You just have to filter out the sales pitch. You typically get an e-mail invitation for these directly from a vendor.

9. Local User Groups

I've talked about this before (at the end of the post).  In addition to software SIGs, look into other groups as well. For me, IEEE has a number of interesting lectures in the area.

Face to face networking with like professionals is very important for career development ("It's not what you know -- it's who you know" may be a cliche, but it’s true.).  Go and participate as much as possible.

If there's not a user group in your area that covers your interests, then start your own! For example: Starting a User Group, Entry #1 (first entry of 4).

10. Conferences and Seminars

Press your employer for travel and expenses, and go when you can. This is another win-win for both of you.  Like Webinars, vendor sponsored one day or half day seminars can be valuable.  Also, as in #9, this is another opportunity to network.

Just getting out of the office every now and then is a good thing.

11. Podcasts

These may be good for some people, but I rarely listen to podcasts.  My experience is that the signal to noise ratio is very low (well below 1). You have to listen to nonsense for long periods of time before you get anything worthwhile. But that's just me. Maybe I don't listen to the right ones?

12. Discussion Sites

CodeProject and Stack Overflow are my favorites. Also, if you do a search at Google Groups you can find people talking about every conceivable subject.

Asking good questions and providing your expertise for answers is a great way to show your professionalism.

13. Blogging

IMO your single most important professional skill is writing. Having a blog that you consistently update with material that interests you is a great way to improve your writing skills.  It forces you to organize your thoughts and attempt to make them comprehensible (and interesting) to others.

14. Take a Class

If you have a University or College nearby, they probably have an Extension system that provide classes.  Also, there are free on-line courses available. e.g.: Stanford, MIT, and U. of Wash.

UPDATE (6/23/09): Here's some more fuel for #13: The benefits of technical blogging. All good points.

——
CodeProject Note:  This is not a technical article but I decided to add the 'CodeProject' tag anyway. I thought the content might be of general interest to CPians even though there's no code here.

Posted in General, Programming | Tagged , , | 5 Comments

Liberate the Data!

Peter Neupert's post Tear Down the Walls and Liberate the Data is worth reading. There are some Microsoft-centric comments, but a number of the linked articles are good and the overall message is correct (IMO anyway).

I might have tried to find a better analogy than 'tear down this wall', but that's because I was never a Ronald Reagan fan.  Nevertheless, this gets across the primary point:

What’s of paramount importance is liberating the data and making it available for re-use in different contexts.

Two major 'walls' stand in the way of this:

  1. “it’s-my-data”
  2. “waiting-for-the-right-standards-set-by-government”

Both exist because of the perceived competitive advantages they provide to organizations and vendors.

Interoperability of data, or enabling data to become "liquid" would allow it to flow easily from system to system. These challenges are the same ones addressed by Adam Bosworth that I discussed in Dreaming of Flexible, Simple, Sloppy, Tolerant in Healthcare IT.

The technical issues are complicated, but I also believe that they not the primary reason that prevent  health IT systems from inter operating.  As Peter suggests, it would be good for HiTech dollars to be used to break down some of the more difficult barriers that prevent data liquidity.

The "proven model[s] for extracting and transforming data" do exist and there is no excuse not to use them.

After thinking about it some more, a more cautionary analogy may be The Exodus -- Mosses leading the Israelites out of the Land of Egypt ("let my data go!").  1) It took an act of God to part the Red Sea, and 2) after their dramatic escape they roamed the desert for 40 years. Let's hope that health IT interoperability does not need devine intervention or suffer the same fate.

Posted in EMR, Interoperability, Microsoft | Tagged , , | 2 Comments

Software Verification vs. Validation

For some reason it just really bugs me that these two terms are incorrectly interchanged so frequently.

Part of the problem is that the document General Principles of Software Validation; Final Guidance for Industry and FDA Staff (2002) does not do a good job of differentiating actual verification and validation activities. They just call everything validation.

The recent MD&DI article Building Quality into Medical Device Software provides a pretty good overview of the these regulatory requirements, but is a another case in point.  The article talks about "software validation" at every step just like the FDA document.

Another similar article on this subject is Software Validation: Turning Concepts into Business Benefits.  It is also confused. e.g. (my highlight):

... software validation involves the execution of tests designed to cover each of the specific system requirements.

No, testing specific requirements is a verification activity! It's no wonder most people are confused.

These definitions, Difference between Verification and Validation, are better as they highlight the sequencing of activities:

Verification takes place before validation, and not vice versa. Verification evaluates documents, plans, code, requirements, and specifications. Validation, on the other hand, evaluates the product itself.

From here (warning PDF):
verification vs. validation

Validation activities (usability testing, user feedback, etc.) are much harder to define, execute, and document properly than most verification testing.

Here are the golden rules:

Verificationwas the product built right?

Validation: was the right product built?

I guess I should get over it...

UPDATE (5/12/09):  Good definitions from here: Diagnosing Medical Device Software Defects Using Static Analysis:

Verification and validation are terms that are often used in software. However, it is important to understand the difference between these two distinct but complementary activities. Software verification provides objective evidence that the design outputs of a particular phase of the software development life cycle meet all of the specified requirements for that phase by checking for consistency, completeness, and correctness of the software and its supporting documentation. Validation, on the other hand, is the confirmation by examination and provision of objective evidence that software specifications conform to user needs and intended uses, and that the particular requirements implemented through software can be consistently fulfilled.

UPDATE (8/6/09):  The importance of proper V&V can not be overstated. The FDA is watching: FDA still enforcing regulations for validation of enterprise software.

UPDATE (2/11/10): I just noticed that the guidance document link on the FDA site was changed and fixed it. When I reviewed the document I found that even though it was "issued" in Jan. 2002 it had been recently updated (11/6/09). The later sections (4, 5, and 6) still use the term validation generically, but the updated document does distinguish between verification and validation:

3.1.2 Verification and Validation

The Quality System regulation is harmonized with ISO 8402:1994, which treats "verification" and "validation" as separate and distinct terms. On the other hand, many software engineering journal articles and textbooks use the terms "verification" and "validation" interchangeably, or in some cases refer to software "verification, validation, and testing (VV&T)" as if it is a single concept, with no distinction among the three terms.

Software verification provides objective evidence that the design outputs of a particular phase of the software development life cycle meet all of the specified requirements for that phase. Software verification looks for consistency, completeness, and correctness of the software and its supporting documentation, as it is being developed, and provides support for a subsequent conclusion that software is validated. Software testing is one of many verification activities intended to confirm that software development output meets its input requirements. Other verification activities include various static and dynamic analyses, code and document inspections, walkthroughs, and other techniques.

Software validation is a part of the design validation for a finished device, but is not separately defined in the Quality System regulation. For purposes of this guidance, FDA considers software validation to be "confirmation by examination and provision of objective evidence that software specifications conform to user needs and intended uses, and that the particular requirements implemented through software can be consistently fulfilled." In practice, software validation activities may occur both during, as well as at the end of the software development life cycle to ensure that all requirements have been fulfilled.

Posted in FDA, Software Quality | Tagged | 28 Comments

All Atwitter

I'm finally all atwitter.  Better late than never. Use the link on the lower right or click here:

Follow me on Twitter

It seems like it will be a good place for quick thoughts, links, and discussion.

Posted in General | Tagged | Leave a comment

Contradictory Observations and Electronic Medical Records

Martin Fowler has an interesting discussion in his ContradictoryObservations post.  This little slice of medically related software design insight is particularly relevant because it highlights (at least for me) the complexity of the use of electronic medical records and their interoperability.

In a broader sense I suppose it also shows some of the underlying difficulties that face the Obama administration's new EMR adoption push.  But I'm not going there.

The concepts of observationsrejection, and evidence are good, but they're just the tip of the iceberg:

rejected and evidence

Even after you've modeled the data interactions, how do you effectively communicate these concepts to the user?  Or to another EMR that doesn't know about your model or how it's used?

Martin's view is that:

Most of the time, of course, we don't use complicated schemes like this. We mostly program in a world that we assume is consistent.

Unfortunately, many of the issues facing electronic medical records do require complex solutions. And even when the world is consistent, how you implement a solution may be (actually, will probably be) very different than how I implement it.  Either way, interoperability will always be a challenge.

We're going to need lots of good software design tools to solve these problems.

Posted in EMR, Interoperability, Programming | Tagged , , | 4 Comments

Kindle 2

Kindle 2I'm a big reader. When my family told me that they ordered a Kindle for me I was pretty excited. Well, that was over two months ago.  We were notified a couple of weeks ago they would be shipping a Kindle 2. It finally arrived!

The Kindle 2 hardware has a modern sleek look and feels great in your hands. At 10.2 ounces it's lighter than most of the books I read.

Here are the things I like:

  • Electronic-ink: The displayed text looks like a real book and can be read anywhere you'd normally be reading.  You can also change the font size to your liking. Because there's no back light, it's easy on the eyes.
  • Whispernet: The Amazon broadband 3G network is subscription-free when you buy the device. This gives you access to the "Kindle Store" as well as the rest of the internet (beware: see Browser notes below). Being able to download purchased books and free previews is the ultimate in instant gratification. There are a number of free e-book sites (e.g. Feedbooks)  that can also be accessed directly from the Kindle.
  • Search, bookmarks, and annotations:  At first you might think that not being able to "flip" through the pages of a book and spot interesting content would be a negative. That may be true for some people. In the long term though, being able to easily find long forgotten content along with your own bookmarks and thoughts is a significant value-added.  I see these features as a real game changer.  Here's a simple example: How many times have you purchased a book that you already own? With an e-reader, it won't happen again.
  • Built-in Dictionary: I'm always looking up words. With the Kindle, you just point at the word and the definition appears at the bottom of the page. Sweet!
  • Personal files: Amazon provides an e-mail based service for converting common file formats (e.g. PDF, DOC, etc.) in to e-book files. This is very handy for keeping reference and personal material on the device.

Here are the things that I've found annoying:

  • Electronic-ink: Static pages are great, but all page turns have a noticeable "blink". Apparently the screen must be blanked with all black before updated text is drawn. You do get used to it though.  The other major issue are the display problems with menus and cursor movement. The pop-up menu is sometimes not completely displayed. Also, you can end up with multiple pointers on the screen if you move the cursor to quickly.
  • 5-way Pointer and Page buttons: The 5-way pointer takes some getting used to. I accidentally purchased a book when I tried to move the pointer up but inadvertently pressed select. Fortunately, I would have bought the book anyway. The page buttons (Next, Prev, and Home) seem kind of tight -- you have to snap down pretty hard to activate them. They look nice, but feel clunky.
  • Keyboard: It works, but the keys are very small and hard to use.
  • Web Browser: The "experimental" Basic Web browser is completely unusable! It doesn't render anything properly and is impossible to navigate. On the Experimental page it says that it "Works best with web sites that are mostly text." And what sites would those be? None!

The text-to-speech feature seems the work -- the voice isn't very natural though and would get on my nerves quickly. But that's OK because I don't plan on using this feature.

IMHO, the benefits of electronic book reading and content management far outweigh the annoyances of the Kindle. You're not going to be browsing the Web or typing e-mails with this device.  The Kindle is for reading books! If you expect more than that you should consider purchasing a different device.

UPDATE (3/2/09): Here's a thorough review of reading newspapers on the Kindle 2: Reading the New York Times on Kindle 2.

Posted in General | Tagged , | 1 Comment

Is Google a Monopoly? Just ask Stack Overflow (and me).

Today's New York Times Digital Domain: Everyone Loves Google, Unitl It's Too Big quotes Jeff Atwood, probably based on this post: The Elephant in the Room: Google Monoculture.

It's interesting that they picked Stack Overflow as an example because even Jeff says:

Now, I don't claim that Stack Overflow is representative of every site on the internet -- obviously it isn't.

I don't know Jeff, I think you're being too modest. This blog doesn't have near the number of visits that SO does, but 95.87% of the search traffic for the last month was  from Google.  Based on an N of 2 then, I'd say that Google does have a monopoly on Internet searching!

UPDATE (3/5/09):

Is Google an Orwellian nightmare? Yes, Google Is Getting Too Big For Its Britches - Case In Point: Google Health. I'm not so sure. Linking Google's search dominance and the intended use of Google Health in some sort of surveillance conspiracy is a bit of a stretch.  If they were related, it would probably just be a clever way to increase ad revenue.  It is interesting that many people have a Big Brother fear reaction to the collection of any personal information. Personally BB doesn't worry me nearly as much as all the little thieves out there that would steal my information for their own benefit, at my expense.

Posted in Google | Tagged , | Leave a comment

Exploring Cloud Computing Development

Cloud ComputingIt's not easy getting your arms around this one. The term Cloud Computing has become a catch-all for a number of related technologies that have been used in enterprise-class systems for many years (e.g. grid computing, SOA, virtualization, etc.).

One of the primary concerns of cloud computing in Healthcare IT is privacy and security.  A majority of the content and comments in just about every article or blog post about CC, re: health data or not, deal with these concerns. I'm going to save that discussion for a future post.

I'm also not going to dig into the multitude of business and technical trade-offs of  these "cloud" options versus more traditional SaaS and other hybrid server approaches.  People write books about this stuff and there's a flood of Internet content that slice and dice these subjects to death.

My purpose here is to provide an overview of cloud computing from a developers point-of-view so we can begin to understand what it would take to implement custom software in the cloud.  All of the major technical aspects are well covered elsewhere and I'm not going to repeat them here. I'm just going to note the things that I think were important to take into consideration when looking at each option.

Here's a simplified definition of Cloud Computing that's easy to understand and will get us started:

Cloud computing is using the internet to access someone else's software running on someone else's hardware in someone else's data center while paying only for what you use.

As a consumer, for example of a social networking site or PHR lets say, this definition fits pretty well.  There's even an EMR that is  implemented in the cloud, Practice Fusion, that would fit this definition.

As a developer though,  I want it to be my software running in the cloud so I can make use of someone else's infrastructure in a cost effective manner.  There are currently three major CC options.  Cloud Options - Amazon, Google, & Microsoft gives a good overview of these.

The Amazon and Google diagrams below were derived from here.

Amazon Web Services

Amazon Cloud Services

The Amazon development model involves building Zen virtual machine images that are run in the cloud by EC2. That means you build your own Linux/Unix or Windows operating system image and upload it to be  run in EC2. AWS has many pre-configured images that you can start with and customize to your needs. There are web service APIs (via WSDL) for the additional support services like S3, SimpleDB, and SQS.  Because you are building self-contained OS images, you are responsible for your own development and deployment tools.

AWS is the most mature of the CC options.  Applications that require the processing of huge amounts of data can make effective you of the AWS on-demand EC2 instances which are managed by Hadoop.

If you have previous virtual machine experience (e.g. with  Microsoft Virtual PC 2007 or VirtualBox) one of the main differences working with EC2 images is that they do not provide persistent storage. The EC2 instances have anywhere from 160 GB to 1.7 TB of attached storage but it disappears as soon as the instance is shut down. If you want to save data you have to use S3, SimpleDB, or your own remote storage server.

It seems to me that having to manage OS images along with applications development could be burdensome.  On the other hand, having complete control over your operating environment gives you maximum flexibility.

A good example of using AWS is here: How We Built a Web Hosting Infrastructure on EC2.

Google AppEngine

Google App Engine

GAE allows you to run Python/Django web applications in the cloud.  Google provides a set of development tools for this purpose. i.e. You can develop your application within the GAE run-time environment on our local system and deploy it after it's been debugged and working the way you want it.

Google provides entity-based SQL-like (GQL) back-end data storage on their scalable infrastructure (BigTable) that will support very large data sets. Integration with Google Accounts allows for simplified user authentication.

From the GAE web site:  "This is a preview release of Google App Engine. For now, applications are restricted to the free quota limits."

Microsoft Windows Azure

Microsoft Windows Azure

Azure is essentially a Windows OS running in the cloud.  You are effectively uploading and running  your ASP.NET (IIS7) or .NET (3.5) application.  Microsoft provides tight integration of Azure development directly into Visual Studio 2008.

For enterprise Microsoft developers the .NET Services and SQL Data Services (SDS) will make Azure a very attractive option.  The Live Framework provides a resource model that includes access to the Microsoft Live Mesh services.

Bottom line for Azure: If you're already a .NET programmer, Microsoft is creating a very comfortable path for you to migrate to their cloud.

Azure is now in CTP and is expected to be released later this year.

UPDATE (4/27/09) Here's a good Azure article:  Patterns For High Availability, Scalability, And Computing Power With Windows Azure.

Getting Started

All three companies make it pretty easy to get software up and running in the cloud. The documentation is generally good, and each has a quick start tutorial to get you going. I tried out the Google App Engine tutorial and had Bob in the Clouds on their server in about 30 minutes.

Bob's Guest Book

Stop by and sign my cloud guest book!

Misc. Notes:

  • All three systems have Web portal tools for managing and monitoring uploaded applications.
  • The Dr. Dobbs article Computing in the Clouds has a more detailed look at AWS and GAE development.

Which is Best for You?

One of the first things that struck me about these options is how different they all are.  Because of this, from a developer's point-of-view I think you'll quickly have a gut feeling about which one best matches your current skill sets and project requirements. The development components are just one piece of the selection process puzzle though. Which one you actually might end up using (it could very well be none) will also be based on all your other technical and business needs.

UPDATE (6/23/09): Here's a good high level cloud computing discussion: Reflections on Executive Briefing Event: Cloud & RIA.  I like the phrase "Cloud Computing is Elastic" because it captures most the appealing aspects of the technology.  It's no wonder Amazon latched on to that one -- EC2.

Posted in .NET, Cloud Computing, Google, Microsoft | Tagged , , , , , , , | 13 Comments