Reference : Understand Hadoop in EDW context of large enterprise.

Organizations have created their EDW ecosystem that they have spend years to develop and implement and people have used the same. Will business allow Hadoop to just come and upset the existing apple cart? Hadoop need not be a big-data-only solution A business with existing EDW system can start using  Hadoop to start and extend the current system.

Business choose better handling of structured data coming from data sources and also processed structured data.  Here a large amount of  data is arriving in the enterprise. To handle large amount of data, the enterprise adds more hardware and/or horsepower to the existing EDW and operational systems or consider alternative ways to manage data. Hence Hadoop (HDFS) can be used as alternate data staging platform to load your EDW and  MapReduce jobs bring the application data into the HDFS, transform it and then send the transformed data on its way to your EDW.

Using Hadoop (HDFS) , we can store both versions of the data in the HDFS: the before application data and the after transformed data. Now the data has been gathered in one place, making it easier to manage, reprocess (if needed) and analyse at a later date. When Hadoop processes the data, EDW resources and operational systems are freed up to  do what they do best, analysis.

Business chooses to leverage structured data sources that have not been integrated into your EDW and unstructured data sources.  Here Hadoop (HDFS) can be leveraged to take advantage of data that’s currently unavailable in your EDW. The current data that is not part of your current EDW might have potential to provide additional insight into your customers, products and services.

You can process and keep the data in Hadoop (HDFS) and, optionally, push relevant data into your EDW to be analysed with existing data. You need not structure all the unstructured data for the EDW. You can then analyse the data using big data apps or BI/analytics tools. Here Hadoop (HDFS) complements the EDW well in terms of storage. any data that EDW cannot handle well can be stored in Hadoop.

Business chooses to  archive all data Use Hadoop to archive all your data on-premises or in the cloud. For the first time, the data need not be destroyed after its regulatory life to save on storage costs. The business analyst or data scientist need not limit his data analysis to the last three, five or seven years. The data also can be stored more easily and cost-effectively.

Enterprise chooses bold step to use Hadoop as the landing platform for all data and exploit the strengths of both the EDW and Hadoop  to enable extracting even more value and insight from one of their greatest strategic assets – data. Here, data captured in Hadoop can be stored in its raw, native state and need not be formatted upfront as with traditional, structured data stores; it can be formatted at the time of the data request. This saves programming efforts by loading data in its native state.