Data Warehousing models - Star schema
Multidimensional
Schema is especially designed to model data warehouse systems. The schemas are designed to address the unique
needs of very large databases designed for the analytical purpose (OLAP).
Star
schema is mature modeling approach widely adopted by relational data
warehouses. It requires modelers to classify their model table as either
dimension or fact. Dimension tables describe the business entities(the things
you model). Entities can include products,people, places and concepts including
time itself. The most consistent table you’ll find in a star schema is a date
dimension table. A dimension table contains a key column(or columns) that act
as a unique identifier and descriptive columns.
Fact
tables store observations or evets and can be sales orders, stock balances,
exchange rates, temperature, etc. A fact table contains dimension key columns
that relate to dimension tables and numeric measure columns. The dimension key
columns determine the dimensionality of a fact table while the dimension key
values determine the granularity of a fact table. Let’s consider a fact table
designed to store sale targets that has two dimension key columns Date and
ProductKey. It’s easy to understand that the table has two dimensions. The
granularity can’t be determined without considering the dimension key values.
As in the example, consider that the values stored in the Date column are the
first day of each month. In this case, the granularity is at month-product
level. Generally dimension table contain a relatively small number of rows.
Fact tables can contain a very large number of rows and continue to grow over
time.
data:image/s3,"s3://crabby-images/30084/300849b3dfcd66e25cc962bb439874f3431e39db" alt=""
There are
several characteristics of star schema:
· Every dimension
in a star schema is represented with the only one-dimension table.
·
The dimension table should contain the set of attributes.
·
The dimension table is joined to the fact table using a
foreign key
·
The dimension table are not joined to each other.
·
Fact table would contain key and measure
·
The star schema is easy to understand and provides optimal
disk usage.
·
The dimension tables are not normalized.
·
The schema is widely supported by BI tools.
There are
several advantages and disadvantages in this star schema modeling.
Advantages |
Disadvanatages |
A simplified schema means that we
won’t have to write confusingly long queries every time we want some
information from our database. |
Denormalizing our data means that
data anomalies could arise from one-off inserts or updates. In practice, star
schemas are implemented via “trickle feeds” or batch processing to compensate
for this issue. |
We optimized for reads. Now that we
can write fewer |
We have limited analytical
flexibility. A star schema is generally designed for a particular purpose.
Since we have fewer features in the star schema than in the full database, we
are restricted to just what this star schema contains. |
Also, it will business logic for
reporting. We won’t have to explain to stakeholders all the crazy joins that
went into making the schema, just maybe . |
|
References
“Star and SnowFlake Schema in Data Warehouse,” Guru99.
[Online]. Available:
https://www.guru99.com/star-snowflake-data-warehousing.html. [Accessed:
27-May-2020].
Peter-Myers, “Understand
star schema and the importance for Power BI - Power BI,” Power BI | Microsoft Docs. [Online]. Available:
https://docs.microsoft.com/en-us/power-bi/guidance/star-schema. [Accessed:
27-May-2020].
Comments
Post a Comment