Normalising shared entities and applying constraints

https://dba.stackexchange.com/questions/90137

13-12-2020
|

سؤال

I'm looking for a specific "best practice" or "pattern" concerning entities that are shared between different entities, having a relation to one of many.

For example, one may have the generic entity "Address", which could be used to store the common address fields for customers, suppliers, employees, etc...

Would a seasoned DBA take that route or would he rather add the fields to the corresponding entities? I'm also thinking about maintainability, constraints that may (in the future) differ depending on the entity, things like that.

I would love to get references to any authoritative or established works on the subject.

المحلول

I can't point to anything authoritative, but if you think through the details of such an implementation, the potential down sides are evident.

Having a "shared" subsidiary table is problematic primarily because if two "parent" tables (say, Customer and Employee) both utilize the same Address table to store addresses, you can't use a foreign key constraint to ensure referential integrity between the Address records and the corresponding parent records.

Even if you forego the use of foreign key constraints, you'll have some conundrums when setting up your foreign key columns. You basically have three options, none of which are ideal:

Put an AddressID field in the Customer and Employee tables. The problems here are (1) you can't guarantee that an address won't be used more than once, and (2) if AddressID is auto-assigned, you can't store the address in the Customer/Employee table until the Address is created, which is backwards from how you'd probably want to insert the records (i.e., it makes Address the "parent" table).
Put CustomerID and EmployeeID columns in the Address table. Problems here include (1) again, you can't guarantee double-use of an address, (2) it wastes space since you're storing a NULL in one column or the other, and (3) it doesn't scale well as you find more entities that need addresses.
Collapse CustomerID/EmployeeID/etc. into a single column ParentID and another column ParentTypeID that distinguishes Customers from Employees. This scales better than (2) above, but has its own issues, such as the need to assign parent tables "magic numbers" for the ParentTypeID.

I won't say there aren't places where such approaches aren't pragmatic, but you do have to consider whether it's really that big a deal to have similar columns in multiple tables, and if trying to move them to a common table really buys you anything for the increased complexity in inserts, updates, joins, etc. IMHO, it's usually better each table to have clear constraints in how they relate to other tables, and for the "parent" tables to represent the core business objects being created (customers, employees, etc.), not subsidiary complex types that happen to be common to several other tables.

نصائح أخرى

The motivation for piling every kind of address into a single table is usually a misinterpretation and misapplication of the notion of code reuse.

People can make the mistake of assuming that because you have two entities with some common set of attributes, that those attributes belong in their own table. Sometimes entities have similar or identical columns coincidentally. One wouldn't create a table for every NAME or DESCRIPTION or EFFECTIVE_DATE in your database, at least I hope one wouldn't be tempted to do this.

Some people mistakenly refer to all instances of removing columns out to their own table as normalisation. Normalisation involves removing columns to their own tables, but not every instance of removing columns in this way is actually normalisation. Normalisation prescribes very specific reasons for removing columns from a table. If none of these reasons are applicable then you aren't normalising you're just making things complicated.

You have two correct ways of thinking about this: Either your addresses all belong in one pile because you have an entity super-type that incorporates all of the common features of several entity subtypes, including addresses, - or - your addresses belong in separate piles (tables) according to each kind of thing that has an address and you write your procedural code against an IAddress interface which is implemented for each address table.

If you actually have an entity super-type, say LEGAL_ENTITY which has subtypes like CUSTOMER, VENDOR, EMPLOYEE and so forth, then having an ADDRESS table that is a child of LEGAL_ENTITY is a legitimate approach. It may even be a valuable approach if there is significant overlap between your customers, vendors, employees (or whatever you're tracking) because you can change addresses once instead of in multiple locations when a legal entity moves. On the other hand, if you don't have such a super-type, then you are going to face the problems pointed out by Richard Tallent.

If you keep your addresses in different tables according to what type of entity owns the address, then you can still achieve code reuse, assuming the language you are using supports interfaces.

As an aside: tvCa pointed out in a comment that addresses may be stored as columns rather than as rows in a separate table. That will depend largely on how many addresses you need for each entity. If you are tracking two addresses (physical, mailing) or if you are storing address history then go with an address table. If you only store one address per addressee, then a separate table is likely overkill.

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى dba.stackexchange