Share On

Data Warehouse Definition

A data warehouse (DW) is a central repository or central database of numerous corporate information and data such as analytics, historical, or customer data, derived from operational systems and external data sources.

What is meant by data warehousing?

Data warehousing is an electronic method of combining information from several sources into one comprehensive database known as data warehouse system. Data /Information can be retrieved from a data warehouse which was integrated months or even years before from multiple data sources.

With data warehousing, a company can analyze its data added in a more holistic way making data mining possible considering the stored data has all the information available.

A Data Warehouse (DW) refers to a storage space for data collected by an enterprise. The data comes from various sources, which may be interlinked to different business procedures within or outside the organization. Also known as Enterprise Data Warehouse (EDW), it is considered as an important part of business intelligence. This repository of data is responsible for report generation and analysis of both previous and current data eventually distributed among knowledge workers.

Data warehouses are usually of two different kinds, physical and logical. The data present in the data warehouse is generally uploaded from the operational systems team of an organization. In this process, the data passes through an operational data store, which calls for data cleansing to ensure data quality before putting it into use at the Data Warehouse.

Data Warehouse vs. Data Mart

  • To start with, a data warehouse is a large environment, which is responsible for the removal of information processing load from databases that are transaction-oriented. The activity of a data warehouse revolves around many areas. On the contrary, a data mart is just a small part of a data warehouse, which is programmed to concentrate on a single functional area at a time.
  • Data Warehouses are generally controlled by a central organizational unit. On the other hand, a data mart is controlled by a single department of a specific department in an organization.
  • Data Warehouses come with a complex design, contrary to the data marts which relatively simple in structure.
  • The data in data warehouses are collected from multiple numbers of sources. However, data marts have access to only a particular number of sources for data integration.
  • The most advantageous side of data warehouses is that they are accessible to the different sections and branches of an organization. The knowledge workers are able to avail both old and current data as per requirement. On the other hand, Data marts offer advantages such as easy data manipulation, faster comprehension of data, and better performance in terms of report generation due to small queries.
  • Data warehouses may take months or years to get implemented. The decisions taken from the data analytics in data warehouses are strategic. Contrarily, Data marts can be implemented within months; all decisions taken in a data mart are tactical.

Environment for Data Warehouses and Data Marts

An ideal data warehouse or a data mart environment consists of the following components:

  • Sources responsible for providing data to the data mart or warehouse;

  • Data integration technology and processes required to make the data ready for use;

  • Data storage architectures, required to record data into the data warehouses or marts;

  • Series of tools and programs such as metadata, governance processes, and data quality to enable users handle activities

Difference between Normal and Dimensional Approach Regarding Storage

Normal Storage Approach refers to the storage of data in the data warehouse conforming to database normalization rules. On the contrary, Dimensional Storage Approach enables the data to be separated either into ‘facts’ (data based on the numeric transaction), or ‘dimensions’ (a reference that proves the facts).

Dimensional Approach Pros:

  • Ease of maintenance and interpretation as experienced by administrators;

  • Enhanced data retrieval;

  • Ideal for slice-dice, roll-up-drill-down data analysis

Dimensional Approach Cons:

  • Increase in data loading time;

  • Flexibility is reduced or almost diminished in case of change in dimension or business;

  • Storage size tends to get increased during de-normalization

Normal Approach Pros:

  • Plug-and-play system in case of change in business or dimension;

  • All parts are componentized (breaking a software system into simplest fragments) to the simplest level;

  • Common entities are capable of reuse. For instance, some entities which have features that are likely to cater to two or more fields are made use of thoroughly. No data requires to be duplicated to be put into use

Normal Approach Cons:

  • Data retrieval can be cumbersome sometimes;

  • Number of entities increases, which can affect maintenance;

  • When discussing about any subject area, users have to refer to lot of entities, which makes the interpretation system complicated

Design Methods of Data Warehouses

Data warehouses are structured based on two designs:

  • Bottom-up design: This design calls for the creation of data marts in the first place, which are responsible for furnishing reports. Next, the data marts are integrated into an all-inclusive data warehouse. The data warehouse is implemented in the form of ‘the bus’ (collection of conformed facts and decisions). Here decisions are shared between facts in two or more of data marts.

  • Top-down design: Data with the greatest detail, also known as ‘Atomic Data’ are directly recorded at the data warehouse. Dimensional data marts, which contain specific data used for business procedures and departments are created from the warehouse.

Some data warehouses exhibit a hybrid approach to design, i.e. they adopt both a top-down and bottom-down approach to carrying their activities forward.

Hybrid Design

In order to diminish data redundancy, a hybrid data warehouse is generally kept on the third normal format. In case of dimensional modeling, a normal relational database is incapable of generating reports related to business intelligence. Small subsets of the data warehouse or data marts integrate data from the warehouse and use specified data for dimensions and facts. In case of hybrid architecture, a master data repository replaces a data warehouse, which is responsible for operational information instead of static ones.

Evolution of Enterprise Data Warehouses

With time, data warehouses have become a lot more sophisticated and more capable of handling high-end data. The evolution of data warehouses can be traced as:

  • Offline Operational Data Warehouse: These warehouses are updated at regular daily, weekly or even monthly cycles based on the data integrated.

  • Offline Data Warehouse: The warehouses at this level are updated regularly with data derived from the operational sources and data from the warehouses are designed to generate reporting.

  • On Time Data Warehouse: This data warehouse shows updated information after every transaction made on the source data.

  • Integrated Data Warehouse: These data warehouses collect data from various business dimensions which are accessible when required.

Benefits of Data Warehouses

There are ample benefits of deploying data warehouses in business intelligence. Some of them are:

  • Data integrated from various sources into one database in order to present information in an Operational Data Store (ODS) with a single query;
  • Improve data quality by generating steady descriptions and codes, flagging bad data or fixing data errors while preserving data history;
  • Enables restructuring of data resulting in query performance, for simple to complex analytic queries, without affecting the operational systems.

Data warehouse (DW or DWH), is a term used in computing. It is basically a system that is used solely data analysis and reporting. Due to its efficiency it is also considered as one of the core components of business intelligence. Data warehouses are centrally placed repositories. They are used to integrate data from one or more far away sources. These warehouses store historical as well as recent and current data at just one place.  These data are of utmost importance to any organization as they are used to creating analytical reports which are used by workers throughout the enterprise to get several information.

The data that the data warehouse store is later uploaded from the operational systems. It can be for marketing or for sales purposes. The data is not always clear and concise. As a result of which they pass they are made to pass through an operational data store and are cleansed before using. This helps the organizations be sure of the fact that there has been no compromise on the quality of the data and the analysis and results are almost accurate that are derived using these data. A lot of times the main sources of data are also inspected, cleansed and transformed. These are also catalogued and made available for managers in the organizations to use. These data from data warehouses are used by business professionals for data mining, market research, online analytical processing and decision support. There are several means of retrieving, extracting, analyzing and transforming data.  These data forms a data dictionary which is considered to be an essential component of a data warehousing system. Thus data warehousing includes different tools of business intelligence, tools used to extract and transform data and also to load and unload them into the repository. Data warehouses also contain tools to manage and retrieve metadata.

Data Visualization: Creating Impactful Reports

White Paper By: DataFactZ Solutions

Data visualization is an effective way to create impactful reports, dashboards that improve decision making, enhanced ad-hoc data analysis, better information sharing, increased ROI, time saving and reduced burden on IT. Data visualization is an essential component in the era of big data, enabling users to see trends and patterns that provide actionable intelligence. This white paper...

Data Diversity and Cutting-Edge Insight For Sales And Marketing

White Paper By: Aberdeen Group

Today, the challenge for many organizations is that the variety of data needed for many innovative analyses is often locked away within other functional areas. Even with the inherent value in traditional customer applications, presently most compelling insights are actually derived by combining multiple different types of disparate data. This whitepaper focuses specifically on the sales...

Data Center Infrastructure Management Enables You to Maximize IT Infrastructure Value

White Paper By: Panduit

The technological platforms such as cloud, big data, telecom industry, and social media are engaging towards better customer service, workforce collaboration, and cost efficient means. It is essential to address the maximizing of IT infrastructure and the efficiency of physical data centre in order to intelligently monitor the availability of resources. The Data Center Infrastructure...

Operationalizing SOA for the Composable Digital Business

White Paper By: EnterpriseWeb

Service Orientation is evolving with the times to support today’s increasingly distributed IT environment. David provides his perspective on the state of Enterprise IT and the struggle to keep up with the demands for interoperability and agility to support Cloud, Internet-of-Things and Blockchain applications. David outlines nine "Disruption Vectors" he identified as key in...

The Logical Data Warehouse

White Paper By: Stone Bond Technologies

The Data Warehouse (DW) has been around some thirty years as essentially a repository forstoring corporate data. The effort to define, design, and implement new data sets in a data warehouse results in backlogs that are prohibitive to support the fast pace of today’s data needs. Most companies will continue to use their Data Warehouse as they move to the more agile approach, but they...

Optimizing Apache Spark™ with Memory1™

White Paper By: Inspur Group Co. Ltd

Apache Spark is a fast and general engine for large-scale data processing. To handle increasing data rates and demanding user expectations, big data processing platforms like Apache Spark have emerged and quickly gained popularity. This whitepaper on “Optimizing Apache Spark with Memory1”demonstrates that by leveraging Memory1 to maximize the available memory, the servers...

follow on linkedin follow on twitter follow on facebook 2018 All Rights Reserved | by: