Business Intelligence 101: Data within Multidimensional View — Part 2

Business Intelligence 101: Data within Multidimensional View — Part 2

In this part of our blog series, we will understand why data warehouses are needed for data analysis rather than DBMS. We will also dive into the fundamental concepts of multidimensional modeling such as facts, dimensions, measures.

Key differences between DBMS and Data warehouse

When we talk about DBMS and data warehouses we talk about a set of tools that businesses used to store, manage, access, and analyze data. DBMS usually stores each software application’s data in a separate database while a data warehouse is used to consolidate and store data from multiple sources. Together, DBMS and Data warehouses aim to provide valuable insights that can help businesses manage customers, track transactions, and make informed decisions. The following diagram shows key differences between DBMS and Data warehouses

As more organizations rely on data-driven insights, the importance of DBMS and data warehouses will continue to grow. Proper management of data is crucial, and these tools help ensure that data is organized, accurate, and can be easily accessed and analyzed to inform business decisions. Relational database management systems, which are the most widely used today, are not well suited for OLAP because of their differences in objectives, data and query processing needs. The data required for OLAP is multidimensional, such as sales by vendor, date, and city. However, relational database tables represent a flattened view, requiring users to figure out how to find the relationships between tables to recreate the multidimensional view, resulting in poor performance. thus, a model adapted to OLAP is needed in order to represent data in several dimensions and manipulate data efficiently.

Understanding Data marts

A data mart is a subset of a data warehouse, intended to meet the needs of a particular sector or function of the company. It is much more focused on a subject often at the departmental or business level. A Data mart, as opposed to a Data Warehouse, is much smaller in size, which makes it easier to understand and handle. This also leads to improved performance, with response times being more acceptable than the several hundred or even a few TB associated with a data warehouse. As such, Datamarts are becoming increasingly popular due to their faster response times and increased manageability.

data-marts-extension-data-warehouse.png

Multidimensional data representation

Is a method of representing data points using multiple projecting dimensions. Its main goal to facilitate the analysis of business information in the presence of multiple business entities and fully understand the underlying relationships between data points. it’s used in data analysis, machine learning, and data visualization For example, the following figure shows the number of sales projected on 3 dimensions: date, product, and region forming a cube.

  • Dimensions: are entities with attributes which describe them(date, product type, region).

  • Cells: contains a business fact which is the actual value stored such as number of sales that occurred in Boston during march.

exploring-multidimensional-data-representation.png

Further, to create a correct multidimensional model and better exploit it. We should follow a straightforward modeling approach that consists of three levels conceptual, logical, and physical. Each resolves issues of different categories. In the next section, we will focus on the conceptual level and its underlying concepts.

Multidimensional Modeling: Conceptual level

Provides a model that describes how data can be viewed in a multidimensional form and is not reliant on specific implementation decisions. The most important concepts of the conceptual level are the following:

  • Dimension tables and hierarchies.

  • Fact tables and Granularity.

  • Measures.

In the next sections, we will explain and break down these concepts in more detail.

Dimension tables

Dimensions are axes used to analyze data and provide aggregate calculation rules for each fact. Each dimension has one or more attributes/members of descriptive character for the facts.

A dimension can contain members organized in a hierarchy, Each member belongs to a particular hierarchical level. The attributes that define the levels of hierarchy are called parameters.

example-dimension-tables-date-product.png

country-date-dimensions-hierarchies.png

Fact tables

Is the main table of the multidimensional model. It contains the observable data, also called facts, about the studied event/transaction according to different analysis axes (dimensions). It contains what we want to measure (quantities sold, amount of revenue…). It contains the foreign keys of the analysis axes (dimensions): Date, Product, Transaction.

The granularity also named grain of a fact table is the finest level of detail or aggregation in a fact table, the grain is determined by the finest hierarchy level of each dimension would determines the level of detail at which the fact table records its quantitative data.

For example, if we have two dimension tables, a product dimension and a date dimension, the product table contains information about a product, the date dimension has a daily granularity, then the fact table that captures sales data would likely have its granularity on the transaction level capturing daily sales of a product.

fact-table-sales.png

Fact tables are usually divided into three main types:

  • Transaction fact table: is used store detailed information about business events and transactions. It is typically used to track and analyze sales data, inventory levels, customer activity and more.

  • periodic Snapshot: stores historical data about a specific point in time. It is used to capture data at regular time periods, such as daily, weekly or monthly.

  • Factless tables: are only used to relate different dimensions. it stores only dimensions foreign keys.

Measures

The measures are quantitative data types on which the analysis is based according to the different dimensions.These values are the result of aggregation operations on the facts such as
number of sales. We have three principal types of measures :

  • Additive: can be added across all dimensions such as quantities sold, turnover.

  • Semi-additive: can be added according to certain dimensions such as account balance it have no meaning to sum up values based on dates.

  • Non-additive: fact that it cannot be added up regardless of the dimension such as Price per unit.

Slowly Changing Dimension

Is a concept that describes how attributes of a dimension can be changed over a longer period of time, e.g. a product name. Depending on the business requirement, there are 3 main types used to record changes in a dimension.

scd-types-differences.png

Fast Changing Dimension

Dimensions that have attributes undergoes very frequent changes and for which we want to keep the history. if we use type 2 scd the table will contain many rows for this dimension then the solution is to isolate the attributes that change frequently in a separate dimension.

Throughout our blog, we have discussed various aspects of multidimensional data modeling. In the next blog, we will delve deeper into multidimensional modeling schemas such as the star schema and the snowflake schema. Stay tuned for an insightful and informative read!