The Case of
The Year 2000
describes some problems with a common model of dates which connects the structures and operations of many computer programs to an organization of events used by humans and their organizations. The problem with this model is that it reflects purposes and assumptions which have changed with the progress of time, so that the old model no longer fits the context of its application.
Back Contents Comments Next

The computer press at the end of the 20th century expended a great deal of ink and paper and bits and bandwidth on millenial warnings regarding the "Year 2000 problem." Like other sorts of millenial warnings, these were based on a certain sort of numerology, but because the problem and its possible consequences derived solely from human inventions, the problem was easier to see, explain, and address than other millenial concerns.

The core of the problem was that many of the computer programs developed since the 1950s (when computer programs started being developed) have used two digits to represent the year, coding (for example) the year 1961 simply as the two digits "61." With the end of the century looming in the year 2000, continuing this system of coding presented a host of problems and there was a real concern that our world's interlinked computer systems would suffer recurring problems because of the way some of their components represented the year.

Two of the immediate consequences of this problem were ambiguity about dates and confusion about their ordering. The ambiguity problem was that with only the last two numbers of the year, there was no way to tell the difference between the years 1961 and 2061 (or 1861 or 2161 to just get started!). The ordering problem is that even though "2001" is after "1967," the two digit encoding "01" is smaller than "67." Computers compare numbers numerically to compare dates chronologically and the use of two-digit descriptions after the year 2000 will introduce inconsistencies in those comparisons, having the system think that a date in the year 2001 comes before a date in the year 1967.

The consequences of this situation would be regrettable and they could have been easily avoided by simply shifting to a full model of years, encoding 1961 with four or more digits. But this is not as simple as we would expect due --- in part --- to the very constraints which created the problem in the first place.

How Did We Get Here?

Why did our technological culture have this problem in the first place? Why did the programmers who wrote those early programs use only the last two digits of years? Didn't they understand the problems it would eventually cause?

In their defense, if we were to make a list of people who've caused problems by not thinking fifty years ahead, the programmers would be pretty low on the list. Furthermore, no one expected computers to become as pervasive as they currently are, complicating both the effects of the choice and its repair at the end of the century. And the programmers were simply working with the constraints of the time.

The reason for the programmer's choice was not a lazy desire to avoid typing four digits (as in the provision until recent years of the prefix 19__ on many personal checks), but was based in the economics of early computers. Two decimal digits (e.g. 02 or 19 or 89) can be represented in a single "byte" of computer memory; four decimal digits require two bytes. Though this seems like a minor difference, there are two important things to remember: (1) in the 1950s, a byte of memory cost about half a dollar; and (2) the extra byte for 4-digit dates needs to be multiplied by the number of dates being used by the program. If we are working with a database of 2,000,000 driver's license records, each with a date of issuance and a date of birth, we are talking 4,000,000 bytes of memory saved by using just two decimal digits for the year. I hope the programmers who thought of the scheme at least got a small raise for saving those millions of dollars.

But this isn't all of the story, as the programmers at the time had other choices available to them. Understanding decisions or constructions from the past demands knowing what choices the decision makers or designers had. At the time, programmers had another option which they chose not to take but which has consequences as we think about the problem being faced today.

Historic and modern computers actually have at least two kinds of memory, distinguished by speed of access: at the time, it was the fast memory (which was still many thousands of times slower than today's memory) which cost a half dollar for a byte. They also had a variety of slower memories --- magnetic drums and disks and tapes --- which were less expensive. It was an option at the time to use one byte of fast memory but two bytes of slower memory to represent dates; in the expensive active memory of the computer, only two digits of the date would be used, but in the slower and persistent storage on magnetic surfaces of various sorts, the "whole" year would be encoded as two bytes.

If the programmers of the day had chosen this strategy, it would just be neccessary, today, to change the parts of the programs which accessed the faster memory, all of which have had to be changed anyway in the intervening years.

Why didn't they do this? Well, for one thing, there would be a computational cost associated with the translation between the encoding schemes whenever information moved between kinds of memory. Computers were hundreds of thousands of times slower at the time, which meant that even a minor extra step could be expensive in terms of time. And partially, again, it was difficult to imagine that there would be a problem fifty years later, or that computers would be so pervasive and interconnected that the assumption could not be easily withdrawn and everything changed.

I've spoken of these past programmers as though they were a unified and coordinated group of engineers and scientists. In fact, they were spread across companies and laboratories around the (mostly Western) world, but they converged on similar designs for the "good engineering" reasons which I described above. Of course, their solutions weren't entirely similar but they tended to be pretty close. Some programmers, whose constraints were different (for instance, fewer database records), probably used two or more bytes to represent the year, while others took the intermediate course I described above. But enough decisions were made, for the best of reasons, to go with two-digit models, that we face this present --- if relatively minor --- crisis.

Saving Time

At the turn of the century, what possible solutions were there to the "Year 2000 problem"? I'll not try to solve past problems here, but I am hoping to use the example to illustrate some of the problems in changing or transforming the models we (or our computers) use.

A first and temporary solution would be to simply stay with the current scheme and use codes like 02 to represent 2002. It would probably be a decade or two before we would run into serious ambiguity problems (considering thirty-year mortgages and the oldest of current computer records) but we would still have the problem of inconsistency in the ordering of dates. Comparing 02 and 61 numerically places 1961 after 2002, which would cause all sorts of problems.

Another solution would be to use the single byte of memory but to represent years by counting from the year 1900. In this scheme, 2002 would just be encoded as 102, rather than 02, avoiding ambiguity and preserving order. Since a single byte can represent any number from 0-255, this would keep us going well into the 22nd century (2125, to be exact, if we keep our thirty-year horizon). If we were straddled with the same practical constraints as designers in the mid-20th century, we might choose this option for a time.

However, we're lucky and computer technology has advanced at amazing rates. Modern databases represent the year entirely or, with more detail, encode times in terms of seconds or fractions of a second past an arbitrary basis and figure out years from that. This takes advantage of the fact that today's computers are many thousands of times faster than their predecessors and the penalty for such calculations are much lower than they once were. Indeed, one of the most common solutions to the year 2000 problem was the automatic translation of dates in "legacy databases" (so called because they are inheritances we are stuck with) into modern formats. Curiously, this is the exact opposite of the intermediate approach which I discussed above as an option for early programmers. Rather than having a model describing multiplie centuries in the database and a model assuming a single century in the computer, this approach uses a multi-century model in the computer to describe a single-century model in the hard-to-change database.

Lessons

What are the lessons learned from this very simple example? First, the form of the problem is one I will be returning to throughout this book. Models are imperfect versions of reality and we can see this as we examine them against a background which lacks those imperfections (but still has others). We can only effectively understand a model as an interface by using this background to describe the two sides which this interface connects.

In this particular case, many computer programs use a model of time which ignores or hides certain information, in this case the century of a given date. If one's purpose is processing information regarding the 20th century, this is perfectly reasonable and --- given other constraints such as cost and complexity --- quite an acceptable assumption. However, it breaks down as we cross the turn of the century. Because we as humans have access to information (whole dates) which the computer's models are ignoring, we can see the problem and anticipate the breakdown in the computer's models.

Looking closer, we've learned that assumptions initially built into a model early on may have substantial consequences for the future. Those assumptions are usually made for the best of reasons given the original context of the model's development. They often reflect details of the system for which the model was originally intended (computers where memory was dreadfully expensive).

We also learned that the constraints and problems of dealing with multiple models can sometimes be solved by translation, but translation incurs a basic cost which has to be weighed against the other constraints and costs.

In the next case, I will look at everyday human understanding of time, contrasting one aspect of the way in which children and adults understand and answer questions about time. In a fashion much more complex than the computer, children's and adult's understandings of time also hide and highlight different properties. And in holding their understanding against a background of a different model of time, we can understand a little more about the way models work.

Copyright (C) 1997, 1998 by Kenneth Haase
Draft, not for citation or circulation
Back Contents Comments Next