The enterprise data warehouse (EDW) is the backbone of analytics and business intelligence for most large organizations and many midsize firms. The tools and techniques are proven, the SQL query language is well known, and there’s plenty of expertise available to keep EDWs humming.
The downside of many relational data warehousing approaches is that they’re rigid and hard to change. You start by modeling the data and creating a schema, but this assumes you know all the questions you’ll need to answer. When new data sources and new questions arise, the schema and related ETL and BI applications have to be updated, which usually requires an expensive, time-consuming effort.
Enter Hadoop, which lets you store data on a massive scale at low cost (compared with similarly scaled commercial databases). What’s more it easily handles variety, complexity and change because you don’t have to conform all the data to a predefined schema.
That sounds great, but where do you find qualified people who know how to use Pig, Hive, Scoop and other tools needed to run Hadoop? More importantly, how do you get fast answers out of a batch-oriented platform that depends on slow and iterative MapReduce data processing?