Friday, June 5, 2009

What’s in a name?

Or more accurately what’s in an ID?

ID formats can vary widely from one system to another. In many of the legacy systems I’ve seen, these IDs do a whole lot more than uniquely identify the Protocol or Grant. In fact many also contain embedded data. Now, I’m a bit of a purist when it comes to the role of an ID in any system. My preference is that they do their one job and do it well: Uniquely identify something. Clean, clear, and to the point. If there is other information that should be a part of the protocol or grant application then it’s easy enough to define additional properties to do that. Each property can be clear in purpose and provide the flexibility to be presented, searched, and sorted however the application demands.

I’ve seen many proposed ID formats that embed other information, such as the year the protocol was created, the version number, and yes, even the protocol status. All of these are better off being distinct data elements and not part of an ID. I can offer some practical reasons why I feel this way:

  1. An ID must remain constant throughout the life of the object it is identifying
    The purpose of an ID is to provide the user a way to uniquely refer to an object. We come to depend upon this value and things would get confusing if we were never sure if the ID we used before will still work. If additional data is embedded into an ID, the risk of the ID having to change because the embedded value changes is real. If this happens, all trust of the correctness of the ID is lost.
  2. Don’t force the user to decode an ID in order to learn more about an object
    It’s easier to read separate fields than it is to force the user to decode a sometimes cryptic abbreviation of data. My preference would be to store each field individually and, wherever it makes sense to do so, display the additional fields alongside the ID. Keeping the fields separate also allows for the ability to search, sort, and display the information however you wish.
  3. All required data may not be known at the time the ID is set
    If some of the additional information embedded in the ID is not known until the user provides it, there is a good chance that it isn’t known when the ID needs to be generated. This can happen quite easily, because the ID is set at the time the object is created. Addressing this issue can get tricky depending upon the timing of creation so it’s best to avoid the problem by not embedding data.
  4. When using a common ID format, devoid of additional data, the ID generation implementation can be highly optimized
    This is the case within Click Commerce Extranet and altering the default implementation can have an adverse performance impact. We know this to be true because our own implementation has evolved over time. This evolution was driven in part because we try to strike a balance between an easy to generate unique ID such as a GUID and one that is human readable and easier to remember, but also because of the need to avoid system level contention if multiple IDs need to be generated at the same time.

The ID format we use in the Extranet product is the result of attempting to strike a balance between performance, uniqueness, and human readability. Also, since IDs are unique within type, we introduced the notion of a Type specific ID Prefix.

Often the biggest challenges in adopting a new ID convention aren’t technical at all. They’re Human. When transitioning from one system to another (Click Extranet isn’t really different in this respect) there are a lot of changes thrust upon the user and change is difficult for many. Users may have become accustomed to mentally decoding the ID to learn more about the object but, in my experience, avoiding the need to do that by keeping the values separate ultimately makes for an easier system to use and comprehend.

Cheers!

2 comments:

  1. Hi Tom - just wanted to register my support for the blog and let you know that I'll be reading. I'm always interested in your take on these issues. I also enjoyed the recap of C3 since I missed participating this year due to my move from NC to FL.

    If you're taking requests, I'd be interested in articles about Aspose integration and those real world AJAX examples you mentioned a little while back.

    In specific reference to this post (since that's where I'm commenting), one reason that it can be tempting to add data to an ID in a Click system is because it's featured so prominently throughout the UI. This can be a pain with projects that reference other project types. For instance, if I have an animal study with an entity of type Facility. I can expose that property on a form and allow a user to select a Facility. The resulting entity link that shows on the page is a relatively meaningless ID to anyone viewing the page. Allowing us to set our own display values on Project Types (which I think is coming in 5.6) should help a lot in this regard.

    Scott Mann

    ReplyDelete
  2. Thanks Scott. You'll be happy to know that in Extranet 5.6 you now have full control over which attribute to used as the display value for the entity.

    Sorry for the late response. For some reason I wasn't notified of your comment. I need to review my preferences. :-(

    ReplyDelete