Hadoop World 2011: A Winding Road To The Enteprise
by Barry Thompson
I made it down to Hadoop World this week at the Sheraton New York Hotel. Great turnout and exciting conference – Hadoop is definitely one of the most interesting, exciting, and important developments currently going on in the world of big data. I was excited to meet a lot of people with their fingers all over it and learn everything I could.
Lots of Activity Around Hadoop
It’s clear that Hadoop is a train that doesn’t show signs of stopping, and it is headed into a mainstream enterprise IT organization near you.
One of the biggest pieces of news was simply the fact that Hadoop can be installed now – there are distros available that don’t require you to download and compile source code. There are a whole bunch of software components and products that its working with, and the energy of the conference was all about drawing awareness to this activity.
But Challenges Remain, Especially for Enterprise Adoption
When I survey the Hadoop landscape I also see some big challenges ahead. Despite all the progress that developers and vendors are making, enterprise usage is going to require a few important pieces.
- Management: what I hear from everyone is that systems are still pretty labor intensive. The development heavy organizations who are pushing Hadoop the hardest can deal with this, but to getting into the enterprise will require some simplification here.
- Performance: The shuffle stage in Hadoop has a long way to go from a performance perspective. The problems with it are well known, and in many large-scale implementations is not a big deal for massive Web companies to throw 5000 blades into the mix when they hit a performance wall. That doesn’t fly in the enterprise.
- Compression: This is one other big area where major optimizations are really going to help when Hadoop goes enterprise. When data is written to a Hadoop cluster, it consumes 3x the disk space as the data itself. Compression is going to be huge here.
Adding Resiliency & Performance of the Hadoop Cluster
What would a data fabric add to Hadoop? Having a highly scalable, ultra-efficient data movement engine as a foundation to a Hadoop-based application could yield huge benefits in terms of both performance and resiliency.
Furthermore, providing easy access to an enterprise’s complete set of data stores for use by a Hadoop application only improves the power of the overall system. Think of the power of a big-analytics system with Hadoop-driven processing across enterprise-scale data.
Big Data Movement
So how are you thinking about Hadoop in the enterprise? We’d love to talk to you about it. Please comment below.
Comments
No comments.