Where did source code merging come from?

The concept of merging changes is central to modern source code control. Specifically the technology of a three way merge, like diff3, which allows two different developers to change the same file and then reconcile their changes together. In the bad old days before we had good merge tools locking files when editing them was common practice. “Yeah I’ll be done in about three days, then you can do your work”. The key insight of CVS in 1990 (and others) was it was OK to let people work on the same thing at the same time if you could reconcile any conflicts later.

So where did the idea of three way merging come from? The oldest code I’ve found is diff3 from V7 Unix, written by Doug McIlroy and dated January 1979.

Wikipedia doesn’t have any history for diff3, but the entry for diff has helpful history on the basic two-way diff saying it goes back to the early 70s, again by McIlroy, based on ideas from other computer scientists. There’s a short technical paper from 1976 about diff, but it doesn’t mention three way diffs.

That same diff page has an intriguing reference

Diff3 compares one file against two other files. It was originally developed by Paul Jensen to reconcile changes made by two people editing a common source. It is also used internally by many revision control systems for merging.

Unfortunately I can’t figure out who this Paul Jensen would be. There’s no citation. The text was in Wikipedia way back in 2004, added by someone from the University of Vermont while fleshing out the history section. But I couldn’t learn anything more.

The use of three way merge in version control dates back to at least 1985, where it’s mentioned in Tichy’s paper on RCS. SCCS dates back to 1972, but I didn’t look into the history of its merge tools. (A 1975 article seemed to say merging was dangerous.)

(Surely there are plenty of people alive today who have personal knowledge of the answer to my question, but I’m too shy to bother them with my idle curiosity.)

Big update with authoritative info

My old Google pal Dan Bentley was kind enough to ask his father (a former Bell Labs person) to ask Doug McIlroy himself about the history of diff3. Here’s the reply Doug wrote, edited a bit:

Paul Jensen’s role was to say, “I’ve wanted a program that …” As far as I know he never made one. But Paul’s nudge was more than sufficient to spur the creation of a candidate program for the purpose. Is it good enough? I don’t know; it strays from an important Unix lesson: programs tend to be better if their authors use them.

There’s an obvious 3D analog for the 2D longest-common-subsequence dynamic program; and the improvements discovered by Harold Stone, Hunt/Szymanski (Harry Hunt, not my coauthor) and Gene Myers likely have analogs too. Whether I considered such–or even thought of them–at the time I can’t remember. Diff3 works by combining results of 2D diffs. This keeps it at O(n^2) worst case for two sequences of length n vis a vis O(n^3) for a true LCS of three.

I’m curious about where Wikipedia learned of Jensen’s role in diff3, for I can’t think of any description of diff3 other than Unix man pages.

Paul was (and still is) a creative type from a development department, who took a sabbatical in 1127. Paul moved on to the Denver Labs, then set up shop in Boulder, .

The details of the output report would be mine; such things don’t get worked out in corridor conversations.

Presumably diff3 has been reimplemented out from under Unix licensing, but if it still has that name, then it has proved Paul’s instinct right. And it gratifyingly proves me wrong in having thought that diff3 “persists mainly as a curiosity”.


One thought on “Where did source code merging come from?

Comments are closed.