Wednesday, January 05, 2005

One to One Relationships

After many weeks, I am back with a question for you all. How do you feel is the best way to handle a one-to-zero or one relationship in a database? Let me explain what I mean.

First, I feel that there are two scenarios where a one-to-zero or one relationship applies. The first example represents two distinct objects, such as an employee and a computer, where a business rule states that one employee cannot have more than one computer, and one computer can only belong to one employee. I would probably argue that in time, this will turn out to be a false business rule, as when one employee leaves, the computer is assigned to another employee. Or when the computer dies, the employee gets another one. But for now, lets just pretend that the one-to-zero or one applies here. I think it makes sense for there to be an employee table and a computer table. Each table would have its own primary key, and lets just assume that they key is an auto-incremented identity. How should the tables be related? Should we add an EmployeeID field to the Computer table? Or should we add a ComputerID field to the Employee table? Here, it really shouldn't matter. We might choose to put the ComputerID in the Employee table because we feel that "employee has a computer" makes more sense than "computer has employee." Either way, it shouldn't really matter, though.

But how about in the second situation where we have a one-to-zero or one relationship to represent a supertype/subtype relationship? For example, we have a supertype of Individual with subtypes of Employee, Contact, and Owner. The Individual table has fields for the first name, middle name, last name, address, phone number, SSN, and there is an auto-incremented identity column which holds the IndividualID. Let's assume that the Employee table has fields for the employee number, title, related department, hire date, start date, and end date. The Contact and Owner table also have unique field in them. So how do we relate the Employee table to the Individual table? How do we represent the "Employee is an Individual" relationship in the database?

I have seen this handled in two ways. First, we could have the primary key of the employee table be the IndividualID. This is also a foreign key to the Individual table. So the IndividualID would exist as the primary key of four tables, the Individual table, the Employee table, the Contact table, and the Owner table. In the Individual table, Employee table, and Contact table, the IndividualID would also be a foreign key to the Individual table.

Or, we could have the primary key of the Employee table be an auto-incremented identity field called EmployeeID. We could then store this ID as a foreign key in the Individual table. So the Individual table would end up with three foreign keys, the EmployeeID, ContactID, and OwnerID. Each of these fields would allow null, and only one field should contain a non-null value.

Which solution is the better one?

13 comments:

Justin J. Vogt said...

Val... your post only takes into consideration Relational databases. Maybe neither suggestion is the best answer.

Think outside the box, we shouldn't let Microsoft's current database implementations stand in our way of representing data the way we want.

JavaKid.

Justin J. Vogt said...

Val... your post only takes into consideration Relational databases. Maybe neither suggestion is the best answer.

Think outside the box, we shouldn't let Microsoft's current database implementations stand in our way of representing data the way we want.

JavaKid.

Valerie Vogt said...

JavaKid-

Are you suggesting we stop using Microsoft tools? Or are you suggesting we build our own object-oriented database? I work in a consulting company who is a Microsoft Gold Partner, and since we specialize in Microsoft tools, I agree that I may tend to get stuck on the idea that I will deliver my enterprise solutions on the Microsoft platform.

Do you know of any good object-oriented databases that you would recommend for enterprise solutions that use Microsoft's .Net platform? I would be very interested in hearing more.

--Val the C# Gal

HAVEWORLD said...

I prefer the first option, IndividualID in all 4 tables.

1) less columns
2) more logical since Individual is the supertype

Andy Gaskell said...

"In the Individual table, Employee table, and Contact table, the IndividualID would also be a foreign key to the Individual table."

Something about that isn't right or I am misunderstanding. Is it supposed to say "In the Owner table...".

In this example both approaches make coding the data access objects (and probably business objects) cumbersome. Now you have to update two tables to edit an employee. To get information on an employee you have to query two tables. It gets to be a real headache.

I've seen some alternate approaches and I like them better. First one would be to simply remove the Individual table. Any columns in the Individual table would be added to the Employee, Owner and Contact tables.

Another approach is to remove Employee, Owner and Contact tables, add some columns to the Individual table, and add a IndividualType table. The IndividualType table would store things like name ("Employee"), description ("Employees work for us"), and information on how to create a concrete object. In .NET this information would be an Assembly column ("MySweetAssembly") and a type column ("MySweetAssembly.Employee"). You would use Activator.CreateInstance to create these objects.

The additional columns would be a foreign key to the IndividualType table and an XML column. The XML column holds XML that the Individual cares about. The schema would different for Employee, Owner, and Contacts. Your business objects will have to know what to do with this XML. The downside to this approach is querying the XML.

This stuff is difficult to write about. If you need a concrete example, let me know.

As far as the one-to-one example, unless the Employee table is huge, I would just add columns to the Employee table to hold information about computers.

How you represent this data as objects in your code is a different story. Once you start designing these objects, "Employee has computer" makes much more sense than "Computer has employee" does to me. Your example even states "employee cannot have more than one computer, and one computer can only belong to one employee".

Justin J. Vogt said...

Val,

From reading your post it sounds like there are a couple ways to handle your situation but neither are ideal. My entire software development career has been spent working in a very object oriented environment.

It is often very difficult to represent object oriented relationships in a relational database (which you can find well documented all over the internet). I have not worked with any good Object Oriented databases but that doesn't mean the current implementation relational database should be the only answer.

I am simply asking you to look at your problem, and come up with an ideal way to solve the situation.... maybe someone can find a way to make your ideal situation come true.

JavaKid.

Valerie Vogt said...

Andy-

You are right. I should have said "In the Owner table, Employee table, and Contact table, the IndividualID would also be a foreign key to the Individual table."

I do agree that you could get around this whole issue by either including the Indivudal columns in each of the Owner, Employee, and Contact tables and getting rid of the Indivudual table. That solution just doesn't sit right with me because of the duplication of information. So in that scenario, if an Invidual has a new attribute, you have to add the column to three tables instead of one.

I also agree that a solution is to move all of the columns from the Owner, Employee, and Contact table into the Individual table, and only use the columns that apply. I know that database storage space is not really an issue any longer, and maybe it just a throwback from those days, but when I see a bunch of columns that should be null in 1/3 of the cases, I hate including them in the table.

I do have my domain objects and mappers set up to support the "is a" relationship, I was just wondering if there is a commonly accepted practice that I do not know about.

Thanks for your feedback-
Val

Andy Gaskell said...

With the XML column you would have no null columns. You would store information specific to that type of individual. A simple example - Let's say that besides all of the attributes of an individual, employees also have DateOfHire and ReportsTo. In the XML column of the Individual table, you would store some XML that looks like this:

< EmployeeInfo >
< DateOfHire >1/1/2003< /DateOfHire >
< ReportsTo >Someone< /ReportsTo >
< /EmployeeInfo >

For Contacts lets say you have home address:
< ContactInfo >
< HomeAddress >123 Fake St< /HomeAddress >
< /ContactInfo >

Now you have flexibility and no null columns.

Valerie Vogt said...

Andy-

Cool idea. I have not heard of that solution before. So you have to use an stream reader to get the data out of the XML file, which is fine because your domain layer can do that work for you. Have you used this approach before? Were there any reporting implications?

Thanks-
Valerie

Jake Good said...

Our project has used a similar approach to reusing tables or tables across multiple subtypes...

One direction we used the "Type" indicator in the database and our mappers worked acchordingly.. with the table holding all of the needed columns... Might not be the best normal form...

We also used the "XML" approach but instead gave each specific subtype the ability to represent itself as a string (XML in your case) .. so we used XML for one subtype and string that was appropriate for the others.. all in one column. The reporting implication, you still can select across it using full text search or like... but you will get more noise...

I would make a decision if the extra information that is used for those subtypes is something that you'll want to easily snag from the database.

Jake

Jake

Anonymous said...

I personally prefer to have the ID of the parent table in each of the children. Because you haven't duplicated all the information into the child tables, this allows you quick search access to all the information from which you can drill down to get at a lower level of detail.

With regards to XML, I currently don't like this approach. Like Andy said, querying isn't very good. This could change with Sql2k5's introduction of an actual XML column type that can be queried...but I'm still not sold that it is the best way to go.

Steve Austin said...

Interesting blog. I have a xml converter blog.

Anonymous said...

Hi,

I was just looking around the net for web sites related to love relationship test and came across your blog. I was going to add a blog to my site, for love relationship test and of course other related material, but I'm not sure if it would work.
I'm a bit worried about getting un-wanted 'rude' posts rather than ones related to love relationship test on my site...... perhaps I just try it out - then you can come and post on it :)

Take care
Stewart