8-3 Dimensional Modeling Explained

Key Concepts

Fact Tables
Dimension Tables
Star Schema
Snowflake Schema
Surrogate Keys
Degenerate Dimensions

Fact Tables

Fact tables contain quantitative data and are the central tables in dimensional modeling. They store measures or metrics that can be aggregated and analyzed. Each row in a fact table represents a measurable event or transaction.

Example: In a sales database, a fact table might store sales transactions, including product ID, quantity sold, and total revenue.

Analogies: Think of fact tables as the main data points on a graph. They represent the core information that you want to analyze and visualize.

Dimension Tables

Dimension tables provide context to the data in fact tables. They describe the attributes or characteristics of the data, such as time, location, product, and customer. Dimension tables are linked to fact tables through foreign keys.

Example: A dimension table for "Products" might include attributes like product name, category, and price. This table is linked to the fact table through the product ID.

Analogies: Think of dimension tables as the labels on a graph. They provide the context and details that help interpret the main data points.

Star Schema

The star schema is a simple and intuitive dimensional modeling structure. It consists of a central fact table surrounded by dimension tables, all connected through foreign keys. The star schema is easy to understand and query, making it popular for data warehousing.

Example: In a sales database, the star schema would have a central "Sales" fact table connected to dimension tables like "Products," "Customers," and "Time."

Analogies: Think of the star schema as a star with rays extending from the center. The central fact table is the star, and the dimension tables are the rays.

Snowflake Schema

The snowflake schema is an extension of the star schema, where dimension tables are normalized into multiple related tables. This reduces redundancy but increases complexity. The snowflake schema is more normalized but can be harder to query.

Example: In a sales database, the "Products" dimension table might be normalized into separate tables for "Product Categories" and "Product Subcategories."

Analogies: Think of the snowflake schema as a snowflake with branches. The central fact table is the core, and the dimension tables branch out in a more complex structure.

Surrogate Keys

Surrogate keys are unique identifiers generated by the database to serve as primary keys in fact and dimension tables. They are not derived from the business data and are used to simplify joins and ensure data integrity.

Example: In a sales database, a surrogate key might be generated for each sales transaction to uniquely identify it, regardless of the product or customer involved.

Analogies: Think of surrogate keys as internal IDs assigned to each record in a library. They help uniquely identify each book without relying on the book's title or author.

Degenerate Dimensions

Degenerate dimensions are attributes that are stored directly in the fact table instead of being moved to a separate dimension table. These attributes are often transaction-specific and do not require their own dimension table.

Example: In a sales database, attributes like invoice number or order number might be stored directly in the fact table as degenerate dimensions.

Analogies: Think of degenerate dimensions as sticky notes attached to the main data points. They provide additional context without needing a separate label.

Conclusion

Dimensional modeling is a powerful technique for designing data warehouses that are optimized for querying and analysis. By understanding fact tables, dimension tables, star schema, snowflake schema, surrogate keys, and degenerate dimensions, a Database Specialist can create efficient and effective data models that support business intelligence and decision-making.