The cloud computing news coming out of Google I/O might not set the larger world afire, but it might light a fire under market leader Amazon Web Services. Google used its annual developer conference to unveil a slew of new cloud services Wednesday, including one called Dataflow that makes it easy to write data-processing pipelines that incorporate both batch and stream-processing capabilities.
Based on Google’s FlumeJava data-pipeline tool and its MillWheel stream-processing system, Dataflow is the company’s answer to Amazon’s Elastic MapReduce and Kinesis, all in one package. Although users can still run their own Hadoop clusters on Google Compute Engine, the company’s infrastructure-as-a-service cloud, Google Cloud platform marketing head Brian Goldfarb described Dataflow’s underlying technologies as having been created, essentially, to overcome the complexity and latency limitations inherent in MapReduce (both the Google version and Hadoop MapReduce).
(A collection of open source tools that cover these same capabilities…
View original post 658 more words