Archive for February, 2009

Is Google a Monopoly? Just ask Stack Overflow (and me).

Sunday, February 22nd, 2009

Today's New York Times Digital Domain: Everyone Loves Google, Unitl It's Too Big quotes Jeff Atwood, probably based on this post: The Elephant in the Room: Google Monoculture.

It's interesting that they picked Stack Overflow as an example because even Jeff says:

Now, I don't claim that Stack Overflow is representative of every site on the internet -- obviously it isn't.

I don't know Jeff, I think you're being too modest. This blog doesn't have near the number of visits that SO does, but 95.87% of the search traffic for the last month was  from Google.  Based on an N of 2 then, I'd say that Google does have a monopoly on Internet searching!

UPDATE (3/5/09):

Is Google an Orwellian nightmare? Yes, Google Is Getting Too Big For Its Britches - Case In Point: Google Health. I'm not so sure. Linking Google's search dominance and the intended use of Google Health in some sort of surveillance conspiracy is a bit of a stretch.  If they were related, it would probably just be a clever way to increase ad revenue.  It is interesting that many people have a Big Brother fear reaction to the collection of any personal information. Personally BB doesn't worry me nearly as much as all the little thieves out there that would steal my information for their own benefit, at my expense.

Exploring Cloud Computing Development

Saturday, February 7th, 2009

Cloud ComputingIt's not easy getting your arms around this one. The term Cloud Computing has become a catch-all for a number of related technologies that have been used in enterprise-class systems for many years (e.g. grid computing, SOA, virtualization, etc.).

One of the primary concerns of cloud computing in Healthcare IT is privacy and security.  A majority of the content and comments in just about every article or blog post about CC, re: health data or not, deal with these concerns. I'm going to save that discussion for a future post.

I'm also not going to dig into the multitude of business and technical trade-offs of  these "cloud" options versus more traditional SaaS and other hybrid server approaches.  People write books about this stuff and there's a flood of Internet content that slice and dice these subjects to death.

My purpose here is to provide an overview of cloud computing from a developers point-of-view so we can begin to understand what it would take to implement custom software in the cloud.  All of the major technical aspects are well covered elsewhere and I'm not going to repeat them here. I'm just going to note the things that I think were important to take into consideration when looking at each option.

Here's a simplified definition of Cloud Computing that's easy to understand and will get us started:

Cloud computing is using the internet to access someone else's software running on someone else's hardware in someone else's data center while paying only for what you use.

As a consumer, for example of a social networking site or PHR lets say, this definition fits pretty well.  There's even an EMR that is  implemented in the cloud, Practice Fusion, that would fit this definition.

As a developer though,  I want it to be my software running in the cloud so I can make use of someone else's infrastructure in a cost effective manner.  There are currently three major CC options.  Cloud Options - Amazon, Google, & Microsoft gives a good overview of these.

The Amazon and Google diagrams below were derived from here.

Amazon Web Services

Amazon Cloud Services

The Amazon development model involves building Zen virtual machine images that are run in the cloud by EC2. That means you build your own Linux/Unix or Windows operating system image and upload it to be  run in EC2. AWS has many pre-configured images that you can start with and customize to your needs. There are web service APIs (via WSDL) for the additional support services like S3, SimpleDB, and SQS.  Because you are building self-contained OS images, you are responsible for your own development and deployment tools.

AWS is the most mature of the CC options.  Applications that require the processing of huge amounts of data can make effective you of the AWS on-demand EC2 instances which are managed by Hadoop.

If you have previous virtual machine experience (e.g. with  Microsoft Virtual PC 2007 or VirtualBox) one of the main differences working with EC2 images is that they do not provide persistent storage. The EC2 instances have anywhere from 160 GB to 1.7 TB of attached storage but it disappears as soon as the instance is shut down. If you want to save data you have to use S3, SimpleDB, or your own remote storage server.

It seems to me that having to manage OS images along with applications development could be burdensome.  On the other hand, having complete control over your operating environment gives you maximum flexibility.

A good example of using AWS is here: How We Built a Web Hosting Infrastructure on EC2.

Google AppEngine

Google App Engine

GAE allows you to run Python/Django web applications in the cloud.  Google provides a set of development tools for this purpose. i.e. You can develop your application within the GAE run-time environment on our local system and deploy it after it's been debugged and working the way you want it.

Google provides entity-based SQL-like (GQL) back-end data storage on their scalable infrastructure (BigTable) that will support very large data sets. Integration with Google Accounts allows for simplified user authentication.

From the GAE web site:  "This is a preview release of Google App Engine. For now, applications are restricted to the free quota limits."

Microsoft Windows Azure

Microsoft Windows Azure

Azure is essentially a Windows OS running in the cloud.  You are effectively uploading and running  your ASP.NET (IIS7) or .NET (3.5) application.  Microsoft provides tight integration of Azure development directly into Visual Studio 2008.

For enterprise Microsoft developers the .NET Services and SQL Data Services (SDS) will make Azure a very attractive option.  The Live Framework provides a resource model that includes access to the Microsoft Live Mesh services.

Bottom line for Azure: If you're already a .NET programmer, Microsoft is creating a very comfortable path for you to migrate to their cloud.

Azure is now in CTP and is expected to be released later this year.

UPDATE (4/27/09) Here's a good Azure article:  Patterns For High Availability, Scalability, And Computing Power With Windows Azure.

Getting Started

All three companies make it pretty easy to get software up and running in the cloud. The documentation is generally good, and each has a quick start tutorial to get you going. I tried out the Google App Engine tutorial and had Bob in the Clouds on their server in about 30 minutes.

Bob's Guest Book

Stop by and sign my cloud guest book!

Misc. Notes:

  • All three systems have Web portal tools for managing and monitoring uploaded applications.
  • The Dr. Dobbs article Computing in the Clouds has a more detailed look at AWS and GAE development.

Which is Best for You?

One of the first things that struck me about these options is how different they all are.  Because of this, from a developer's point-of-view I think you'll quickly have a gut feeling about which one best matches your current skill sets and project requirements. The development components are just one piece of the selection process puzzle though. Which one you actually might end up using (it could very well be none) will also be based on all your other technical and business needs.

UPDATE (6/23/09): Here's a good high level cloud computing discussion: Reflections on Executive Briefing Event: Cloud & RIA.  I like the phrase "Cloud Computing is Elastic" because it captures most the appealing aspects of the technology.  It's no wonder Amazon latched on to that one -- EC2.