Domanda

We started designing a process for detecting changes in our ERP database for creating datawarehouse databases. Since they don't like placing triggers on the ERP databases or even enabling CDC (sql server), we are thinking in reading changes from the databases that get replicated to a central repository through transaction replication, then have an extra copy that will merge the changes (we will have CDC on the extra copy)...

I wonder if there is a possibility where data that changes within, let's say 15 minutes, is important enough to consider a change in our design, the way we plan in designing this would not be able to track every single change, it will only get the latest after a period of time, so for example if a value on a row changes from A to B, then 1 minute later, changes from B to C, the replication system will bring that last value to the central repository, then we will merge the table with our extra copy (that extra copy might have had the value of A, then it will be updated with C, and we lost the value of B).

Is there a good scenario in a data warehouse database where you need to track ALL changes a table has gone through ?

È stato utile?

Soluzione

Taking care of historical data in a DW is important in some cases such as:

  1. When the dimension value changes. Say, a supplier merged with another and changed their commercial name

  2. When the fact table uses calculations derived based on other information outside the fact table that changes. Say conversion rate changes for example.

  3. When you need to run queries that reflect fact information in previous periods (versions of the fact table).

An example where every change maters may be a bank account's balance or a storage warehouse item count or a stock price, etc.

For your particular case, you should check with your customer how the system will be used and what is its benefits exactly, and design accordingly. How granular the change should be captured (every hour, day, etc.) is primarily your customer's call.

Some techniques in handling dimension data change is in Kimball-Slowly Changing Dimension.

Altri suggerimenti

In direct answer to your question: depends on the application.

Examples: The value is the description field of an item in some inventory, where the items themselves do not change (i.e. item ID X is always a sparkly-thingy). In this case saving short lived descriptions is probably not required.

The value is the last reading of temperature sensor. If it goes over a certain value action is taken to bring the temperature back. In this case you certainly need to save each an every change.

This raises three points:

  1. The second case where every single change is required shows very bad design. Such a system would surely insert new values with a time stamp into a table and not update a single value.

  2. Bad designs do exist. Hence:

  3. The amount data being warehoused depends on the nature of data.
    a. Will you be able to derive any intelligence from your warehoused data?
    b. Will you be able to know based on changes at the database level what happened at the business level?
    c. What happens to your data when the database schema changes because you upgraded the ERP product?

I'm wondering whether saving a log of changes on the table level is usable. You might be able to reverse engineer what a set of changes means and then save that to the warehouse, or actually get the ERP to "tell" you what it has done and save those changes.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top