Friday, June 5, 2009

What’s in a name?

Or more accurately what’s in an ID?

ID formats can vary widely from one system to another. In many of the legacy systems I’ve seen, these IDs do a whole lot more than uniquely identify the Protocol or Grant. In fact many also contain embedded data. Now, I’m a bit of a purist when it comes to the role of an ID in any system. My preference is that they do their one job and do it well: Uniquely identify something. Clean, clear, and to the point. If there is other information that should be a part of the protocol or grant application then it’s easy enough to define additional properties to do that. Each property can be clear in purpose and provide the flexibility to be presented, searched, and sorted however the application demands.

I’ve seen many proposed ID formats that embed other information, such as the year the protocol was created, the version number, and yes, even the protocol status. All of these are better off being distinct data elements and not part of an ID. I can offer some practical reasons why I feel this way:

  1. An ID must remain constant throughout the life of the object it is identifying
    The purpose of an ID is to provide the user a way to uniquely refer to an object. We come to depend upon this value and things would get confusing if we were never sure if the ID we used before will still work. If additional data is embedded into an ID, the risk of the ID having to change because the embedded value changes is real. If this happens, all trust of the correctness of the ID is lost.
  2. Don’t force the user to decode an ID in order to learn more about an object
    It’s easier to read separate fields than it is to force the user to decode a sometimes cryptic abbreviation of data. My preference would be to store each field individually and, wherever it makes sense to do so, display the additional fields alongside the ID. Keeping the fields separate also allows for the ability to search, sort, and display the information however you wish.
  3. All required data may not be known at the time the ID is set
    If some of the additional information embedded in the ID is not known until the user provides it, there is a good chance that it isn’t known when the ID needs to be generated. This can happen quite easily, because the ID is set at the time the object is created. Addressing this issue can get tricky depending upon the timing of creation so it’s best to avoid the problem by not embedding data.
  4. When using a common ID format, devoid of additional data, the ID generation implementation can be highly optimized
    This is the case within Click Commerce Extranet and altering the default implementation can have an adverse performance impact. We know this to be true because our own implementation has evolved over time. This evolution was driven in part because we try to strike a balance between an easy to generate unique ID such as a GUID and one that is human readable and easier to remember, but also because of the need to avoid system level contention if multiple IDs need to be generated at the same time.

The ID format we use in the Extranet product is the result of attempting to strike a balance between performance, uniqueness, and human readability. Also, since IDs are unique within type, we introduced the notion of a Type specific ID Prefix.

Often the biggest challenges in adopting a new ID convention aren’t technical at all. They’re Human. When transitioning from one system to another (Click Extranet isn’t really different in this respect) there are a lot of changes thrust upon the user and change is difficult for many. Users may have become accustomed to mentally decoding the ID to learn more about the object but, in my experience, avoiding the need to do that by keeping the values separate ultimately makes for an easier system to use and comprehend.

Cheers!