Natural or Surrogate Key

Autonumbers, technical keys, intelligent keys too

Added 2002-06-20

Great discussion on WebDesign-L today about database tables, including one-to-many and many-to-many designs, and the use of keys. The opportunity to talk about country code came up -- or perhaps it was just my imagination. I mentioned it anyway, because country code is a great example of a natural key. In the wild, country code is almost always a foreign key, referencing the usually non-existent country table.

A recent article of mine, Using generated numbers for primary keys, part 2 also mentions country code. I favour surrogate keys in most cases, but I love country code as an example of a natural key. Being of good open mind (a requisite skill for a data modeller), I also included links to other articles:

Since then, I've found additional articles on natural and surrogate keys. SQLteam.com has Identity and Primary Keys - Part I, which includes "some very good reasons NOT to use an Identity column." Understanding Identity Columns is a practical how-to for SQL/Server identity columns.

Then at Builder.com.com (hey, don't blame me, that's their domain name) I discovered The great primary-key debate. Something about this article made me uneasy right away:

There are six points there, right? First, unique. Check. Second, not null. Check. But the third point is the same as the second -- the primary key cannot be null. The fourth and sixth points are sort of the same One of them deals with redesign, which is not really germane, and the other with changing values. As for compactness, "fewest possible" is either an unclear reference to normalization, or a general design principle nobody would argue with. So really, there are only three rules and a suggestion. And one of the rules is wrong! Primary keys do change. Both their design, when that's required, and their values. It happens all the time. But the article says "Note that the word must in the above list doesn’t mean perhaps or most of the time -- must is absolute."

There's a nice chart on page 2 of the article, but it perpetuates the kind of thinking just shown. At the end, the authors say "To avoid future problems and subsequent (and perhaps convoluted) repairs, we recommend that you use surrogate keys." I happen to agree with this recommendation, but not with the way they arrived at it. And apparently they've never migrated surrogate keys from one database to another, because that's painful as well as convoluted. There are several dozen lively comments posted by readers at the end of the article, ranging from "totally agree" to "flat wrong."

Two other articles on the same site also disappointed me recently. Intelligent keys aren't that dumb presents a good argument for not designing a key with "built-in" meaning, thus seeming to contradict the title. For example, don't use a 1 as part of the order number to mean the first order for a customer that day. No trained data modeller would do this, though. The article concludes that when you "finally discard their precious intelligent key, [the users] will recognize it as an intelligent decision." Well, sure. But in reaching this conclusion, the author has trampled all over the concept of a natural key. "The main key -- the primary key -- should never contain data entered by users." The article never actually uses the term surrogate key, but it's there. "If the users need an invoice number to print on a piece of paper for a warehouse worker, then give it to them. However, the database ought to handle all the hidden data according to the rules of normalization." There's good advice in the article, especially about dealing with users during a database design project, but the argument for surrogate keys is weak.

Finally, I recently read the Builder.com article Database design for platform independence. It starts out with great intentions, but immediately makes two suggestions that I would call "howlers" -- don't use stored procedures and don't use joins. I can understand where the author is coming from, but that's ridiculous, and as one person stated in the comments posted to the article, it shows "how a corporate dictum like 'All database applications must be independent of vendor specific features' is not only impractical but dangerous. What pointy headed manager came up with this?"

Even more hilarious were the descriptions of the types of joins (as if that mattered, since you shouldn't be using them) --

In the comments posted after the article, the author gets duly reamed for those definitions. It's sad, really.

There was a nice attempt to identify "safe" datatypes. Anyone who has migrated an autonumber or identity field to an Oracle sequence number will know the effect that this can have on SQL. But would you avoid using a generated number just to make the application "cross-database"? That'd be silly. Even the author says some datatypes, like Microsoft Access’ AUTONUMBER, "are quite handy."

If you have any comments or questions about natural or surrogate keys, please contact me.