Bringing down "The House" in Data Warehousing

Data Warehousing means different things to different people. Generally speaking, our architects believe Data Warehousing entails “The House” and all the processes needed to plan, build, operate, and control the Data Warehouse.

Bringing Down 'The House'

Data Warehousing starts with a solid methodology combined with a strong PMO (Project Management Office) framework. We believe ‘the house’ needs a great ETL or EAI (Enterprise Application Integration) tool to push or pull the data into a data store. This data store could be an ODS (Operational Data Store, a Staging Area, or a CIF (Corporate Information Factory). There are many ways to build a Data Warehouse. Just like in Data Mining, the scope and requirements will dictate what type of “house” you will need to build. We are firm believers in the star schema/MD (multidimensional design) for marts which need OLAP (Online Analytical Processing) tools. We also find that a MD helps us Data Miners slice and dice the data the way we like it. Special care is needed in building fact tables with conformed dimensions (tall tables). Good performance relies on smart aggregate table design, proper indexing, and the savvy usage of the inherent parallel processing capabilities of your DWH tools. Slowly and quickly changing dimensions must be addressed early on in the development process. Bridge tables and snowflakes may be necessary at times even though we don’t advocate them. As with Data Mining, data quality is the key to a successful Data Warehouse. Your DWH architect needs to thoroughly think through all the facets of architecture which include business, application, information, and technical architecture. Your developers, integrators, and transformers need to understand the DWH blueprint while aiding in the blueprint design. Our technical architects will ensure a web based architecture is achieved while exposing the necessary web services to leverage a SOA (Service Orientated Architecture). For compute intensive operations we also promote grid computing especially around large volume aggregations and analytics which may be pre-calculated for the mart. And last, but not least, we promote interconnections with a Meta Data Repository (MDR). It is important to interconnect the Web Services UDDI, WSDL, XML, and SOAP headers in with the MDR.

To sum up, building a Data Warehouse needs clearly articulated documents for the scope and requirements. It is crucial to incorporate a DWH development methodology while staffing the right people to do the proper job. Remember, at DBmind, Data Warehousing is more then just “The House”!