licensing data

scott cotton scott at
Mon Nov 8 14:15:41 UTC 2004

I'm interested to know if anyone has pointers, information, or opinions on
what "open source" means in the case licensing data.  Currently, this
is just a curiosity for me.. I've encountered several distributions of data 
collections which are used by software systems and thus far I've only seen

1) licenses for software bundled with the data, but not licenses for the data by
itself.  For example, the dictionaries that come with gnu aspell or the 
OS tcp signature database in nmap.
2) proprietary licenses for data collections.  For example,
   the Penn Treebank:
3) licenses for documentation collections, rather than data collections 
which software makes use of.

I'm interested in the question of what open source would mean for 
collections of data intended for use with software (as in 1 and 2 above) but 
with a license that is dedicated solely to the data collection (as in 2 above).

For me, the most obvious questions regard 

0) Does the OSI have an interest in determining what open source means
for data collections?  Or has this already been considered? (if so, sorry 
...I couldn't find it)

1) what constitutes a derived work (does software making use of the data
collection fall under this rubric? Other data collections which are created by
use of the data collection but do not include it? include it only in part?) and 

2) what constitutes a licensable data collection.  For example, If 
I create a  full text index of some collection of documents that is publicly
available, and distribute it under some copyright license in my name, and
someone else does the same thing and distributes their data under their name
with a license incompatible with mine, but both collections of data are
identical, does that constitute breach of copyright?  I guess I could restate
this as: is it the contents of the data collection which is copyrightable or
the fact that an effort was undertaken to collect the data?

While questions 1 & 2 aren't specific to open source, they seem important
for determining what an open source license for a collection of data would be. 

Any comments, pointers, etc gratefuly appreciated.



(Of course I'm aware that discussions on this list don't consistitute legal
advice from a lawyer..)

More information about the License-discuss mailing list