[License-discuss] objective criteria for license evaluation

Mon Dec 10 17:23:57 UTC 2012

On Mon, Dec 10, 2012 at 2:57 AM, Gervase Markham <gerv at mozilla.org> wrote:
> On 09/12/12 18:46, Luis Villa wrote:
>>
>> So let me restate the question to broaden it a bit. If you had a
>> *blue-sky dream* what subjective information would you look at?

By the way, I think this was probably obvious from the rest of the
email, but I meant *objective* here.

>> For example, if you had the resources to scan huge numbers of code
>> repositories, what numbers would you look for?
>>
>> * ranking by LoC under each license
>> * ranking by "projects" under each license
>> * ... ?
>
>
> If we are blue-sky dreaming, then I would like to rank by "_useful_, unique
> lines of code under each license". "Useful" in the sense that some
> half-finished barely-compiling "my first Windows CD player" on Sourceforge
> counts for nothing, whereas jQuery counts for a lot. "Unique", in the sense
> that I shouldn't be able to game the stats by going to github and forking
> every project with my preferred license.

How to define "useful" objectively? Size is the obvious,
plausibly-obtainable proxy here for "useful"- "projects over X LOC" or
something like that. I suppose if you had a custom crawler that had
knowledge of git/svn/cvs/etc., you could do "projects over 5
committers" or "projects with over 100 commits" or something along
those lines. Richard suggests community size, which would be great but
is probably not computable, no matter how many people/how much money
you throw at it.

It may be that in practice, objective information has to be stored in
the same revision control system the relevant license information is
stored in. Otherwise you're not talking about something that can be
crawled/computed- you're talking about something that requires human
intervention, which even if it is objective still limits your sample
size.

>> Similarly, if you could declare objective criteria for textual license
>> analysis and had the time/resources to read all of them, what would
>> those criteria be? e.g.,
>>
>> * has/has not been retired by the author
>
> This is important; however some licenses such as the HPND have no identified
> author, but yet are deprecated.

Deprecated by *who*? :) (Note that we don't even have a "deprecated"
category right now; we've only gotten as far as "redundant with more
popular licenses.")

>> * has/has not been obsoleted by a new license published by the same author
>
> - one can imagine a license which has been obsoleted by its author but is
> still in wide use, and even specifically chosen over newer versions (e.g.
> GPL 2)
>
>> * has/doesn't have an explicit patent grant
>
> - I am of the view that even if the OSI finds it impossible politically to
> recommend specific licenses, it should try and get to a place where it can
> recommend license features - with an explicit patent grant being in pole
> position.

Any others?

>> * ... ?
>
> I think there is also a place for "lawyers generally think it's vague and
> has sub-optimal word choice", which might apply to e.g. Artistic v1.

As Richard points out, it is very hard to imagine how to make this
objective, but I'd encourage folks to think creatively about it.

>> * Plays well with other popular licenses. We now have a "can use in"
>> progression which goes:
>
> MIT/BSD -> Apache 2 -> MPL 2 -> LGPL 3 -> GPL 3 (-> AGPL 3)
>
> (Those GPL numbers could be 2 rather than 3 if there was a warning about the
> Apache2/GPL2 incompatibility which the FSF asserts.)
>
> If your code doesn't slot somewhere into that ecosystem, you are (IMO)
> significantly reducing the likelihood of it gaining widespread use, all
> other things being equal.

I like the intuition here, but I'd like to push us to think about more
objective criteria: what does it mean to "play nicely"? Presumably
"compatible", but who determines compatibility? What does it mean? Can
that be determined objectively?

Plays nicely with what other popular licenses? EPL is popular, for example.

Luis