Derivative/collective works and OSL
chuck at codefab.com
Sun Feb 6 22:08:40 UTC 2005
John Cowan wrote:
[ ... ]
> There is a similar dispute, though not so problematic, at the other end:
> does mere compiling of source code create a derivative work, or is the
> object code the original work in a different medium, as a paperback
> book is the same work as a hardback original?
> Nobody knows the answer to that either.
It may well be true that the courts have not considered a dispute involving
this specific issue and resolved it in a way that sets a clear standard or
However, there exists a branch of software engineering known as compiler
design, and there exist experts in that field who have written working
compilers who share a common understanding of how a compiler toolchain
operates: compilers perform a mechanical, deterministic, and reversible
transformation on source code to produce object code.
By definition, this transformation does not change the semantic meaning of the
program and does not involve human decision-making or any possibility of
To use your analogy as a starting point, consider taking a book and
translating it to another language. For human languages, this is a creative
process since there can be many ways to translate something, the process is
not deterministic since two translators often produce output which is
noticably different, and the process is not reversible: if you translate a
sentence from English to Russian, and then from Russian back to English, it is
very likely that what you get back is not the same as the original work.
[ A classic example from NLP was: "The spirit is willing, but the flesh is
weak." became "The vodka is good but the meat is rotten." ]
Computer languages are unlike human languages: they possess well-defined
semantics, enforced by compiler parsing rules like LR(1) or LALR(1) which
forbid ambiguity and ensure that well-defined source code has one and only one
meaning when compiled. You can compile a source code file with one compiler
into an object file, decompile the object file via a disassembler or debugger
like gdb, and then recompile that result into a new object file using a
different compiler. You will end up with a program that has the same exact
behavior and meaning as the original program.
The process of compiling software is thus very similar to photocopying an
original document, and then photocopying the copy. With analog photometric
reproduction, the process is lossy (the "Xerox" effect where a second copy
becomes blurry compared with the original), but a digital process does not
suffer generation loss.
> The reason it matters is that pretty much everyone agrees that a tarball
> is a collective work,
If I put a book-- a single work, written by a single author-- into a box and
mail that box, the box only contains a single work. If I put two books into
the box, then there are two works in the box, but that does not mean the box
is a collective work: it is a mere aggregation of two components which are
distinct and can be handled seperately without any confusion.
The tape archive format, or tarball, is a method of packaging content for
shipment over the network or for convenient long-term storage, just as the box
used for the sake of example is a convenient method for packaging content
for shipment via the postal service.
A tarball of a single work is an archive containing a single work, not a
collective work. A tarball of two seperate works is an archive of two
seperate works, which is a simple aggregation and not a collective work. 
> ...and if when compiled it is still a collective work,
> then it is not derivative of any of the works contained in the tarball.
You can't compile a tarball without extracting the contents any more than one
could read a book mailed to you in a box without opening that box, first.
Is a photocopy of a document considered a derived work, or is it considered to
be the same thing as the original work for practical and legal purposes?
: If the code being compiled has a bug that results in undefined behavior,
the compiler is allowed to produce different results when invoked with
different optimization flags or compared with the output generated by another
compiler. While true, this does not refute my argument: what the compiler is
allowed to do when compiling/optimizing source code is required not to diverge
for code which does not involve undefined behavior.
Page 586 of _Compilers: Principles, Techniques, and Tools_ by Aho, Sethi, and
Ullman states: "First, a transformation must preserve the meaning of
programs. That is, an 'optimization' must not change the output produced by a
program for a given input, or cause an error, such as a division by zero, that
was not present in the original program. The influence of this criterion
prevades this chapter; at all times we take the 'safe' approach of missing an
opportunity to apply a transformation rather than risk changing what the
: The vast majority of archives found on various FTP and websites contain a
single work comprised of one or more source code files. There are a few cases
where a tarball contains several works, such as nmap shipping with libpcap, or
Python coming with expat, but it is easy to see that these are two seperate
works because the archive keeps them in two seperate directory tree hierarchies.
I suppose it would be possible to rip out all of the pages from two books, and
mix them together on a chapter-by-chapter or page-by-page basis to form a new
work which actually was a single indivisible compilation, just as it would be
possible to mix all of the files of two software projects together to work a
new work, but that is certainly not the normal case.
More information about the License-discuss