[License-discuss] notes on a systematic approach to "popular" licenses

Tue Jan 10 22:33:57 UTC 2017

Luis

Thanks for keeping this discussion alive. My comments:

As for popular licenses, I generally agree with your suggestions. I
would also argue that coming up with a list of de-facto most popular
licenses shouldn't be as bitterly controversial as you're prepared
for, and maybe the history of this discussion is.

The fact is, whatever method you choose, you should roughly expect to
see an exponentially decreasing curve. There would typically arise
some threshold, above which the most popular items are clearly ahead
of everyone else. (e.g. the 20/80 Pareto rule, etc)

Purely for the sake of this discussion, if we look at Black Duck's
list: https://www.blackducksoftware.com/top-open-source-licenses

...I can see 2-4 such thresholds:
Threshold 1: The 3rd license is 7% ahead of the 4th
Threshold 2: The 4th license is 3% ahead of the 5th
Threshold 3: The 8th license is 2% ahead of the 9th
Threshold 4: Anything at least 1% or higher

Of course, from the above set, people would still lobby for a
threshold including their favorite license or excluding one they
dislike. But it is nevertheless a limited set to argue about, that
arises naturally from the data.

Furthermore, your suggestion of also listing new licenses, should help
ease the pressure. For example, nobody could argue against that the
set above the tightest threshold - MIT, GPLv2 and Apache - are the
most popular licenses. They also happen to include exactly the 3 types
people most often look for: A short and permissive "BSD style"
license, a strong copyleft license, and a long permissive license,
including for example a clear patent license. So even if someone might
want to argue for a list that is longer than 3, this "emerging out of
the data" threshold already landed us in a very useful place.

Now, one could of course criticize this and say that a list of popular
licenses must include at least GPLv3 and BSD (or LGPL, or something
else). But GPLv3 could in this case be featured in the list of new
licenses, with the explanation that it is an update to GPLv2, but has
not yet overtaken it in popularity. BSD on the other hand - based on
this data - is perhaps not worthy of being on a most popular list? The
data clearly suggests that the MIT license is the most popular one
among this family, even if "BSD style" is the common name for the
category you hear most often.

In fact, adding the concept of license families might again help ease
some pressure from this discussion, and also be genuinely useful. So
for example, the entry for GPLv2 should list all other licenses in the
GPL family (both v2 and v3), and the entry for MIT could then link to
a list of other BSD-style licenses. I don't know if the Apache License
has such siblings at the moment?

A few points on the list for new licenses. I think your idea and
criteria are sound. Perhaps a nice addition to your proposal would be
to provide some context by digging through historical statistics: How
many new licenses have been approved in the last, say, 10 years, that
weren't legacy, redundant, special purpose, etc? Perhaps the number is
small enough that they could all be added to such a list?

I would add then an expiration date to this list, which could be for
example 10 years. The point of the expiration date of course would be
that a license on the "new list" should become a popular license
within that time, or if it doesn't, it will no longer be featured.
(There's a correlation between how low the threshold is for the
popular list and the expiration date for the new list.)

henrik

On Tue, Jan 10, 2017 at 6:07 PM, Luis Villa <luis at lu.is> wrote:
> [Apparently I got unsubscribed at some point, so if you've sent an email
> here in recent months seeking my feedback, please resend.]
>
> Hey, all-
> I promised some board members a summary of my investigation in '12-'13 into
> updating, supplementing, or replacing the "popular licenses" list. Here
> goes.
>
> tl;dr
> I think OSI should have an data-driven short license list with a replicable
> and transparent methodology, supplemented by a new-and-good(?) list that
> captures licenses that aren't yet popular but are high quality and have some
> substantial improvement that advances the goals of OSI.
>
> Purposes of non-comprehensive lists
> If you Google "open source licenses", OSI pages are the top two hits.
> Historically, those pages were not very helpful unless you already knew
> something about open source. Having a shorter "top" list can help make the
> OSI website more useful to newcomers by suggesting a starting place for
> their exploration and education about open source.
>
> In addition, third parties often look to OSI as a trusted (neutral?) source
> for "top" or "best" licenses that they can incorporate into products. (The
> full OSI-approved list is not practical for many applications.) For example,
> if OSI had an up-to-date short list, it might have been the basis for
> GitHub's license chooser.
>
> A list that is purely based on popularity would freeze open source in a
> particular time, likely making it hard for new licenses with important
> innovations to get adoption. However, a list based on more subjective
> criteria is hard to create and update.
>
> Past attempts
>
> The proliferation report attempted to address this problem by categorizing
> existing licenses. These categories were, intentionally or not, seen as the
> "popular or strong communities list" and "everything else". Without a
> process or clear set of criteria to update the "popular" list, however, it
> became frozen in time. It is now difficult to credibly recommend the list to
> newcomers or third parties (MPL 1.1 is deprecated; no mention of Blackduck
> #4 GPL v3; etc.).
>
> There was also substantial work done towards a license "chooser" or
> "wizard". However, this runs into some of the same problems - either the
> chooser is opinionated (and so pisses off people, and potentially locks the
> licenses in time) or is borderline-useless for newcomers (because it still
> requires substantial additional research after using it).
>
> Data-driven "popular" list
>
> With all that in mind, I think that OSI needs a (mostly) data-driven
> "popular" shortlist, based on a scan of public code + application of
> (mostly?) objective rules to the outcome of that scan.
>
> To maintain OSI's reputation as being (reasonably) neutral and independent,
> OSI should probably avoid basing this on third-party license surveys (e.g.,
> Black Duck) unless their methodologies and data sources are well-documented.
> Ideally someone will write code so that the "survey" can be run by OSI and
> reproduced by others.
>
> Hard decisions on how to collect and "process" the data will include:
>
> choice of data sources: What data sources are drawn on? Key Linux distros?
> GitHub? per-language repos like maven, cpan, npm, etc?
> what are you counting? Projects? (May favor small, throwaway projects?)
> Lines of code? (May favor the largest, most complex projects?) ... ?
> which license tools? Some scanners are more aggressive in trying to identify
> something, while others prefer accuracy over comprehensiveness. In 2013
> there was no good answer to this, but my understanding is that fossology now
> has three different scanners, so for OSI's purposes it may be sufficient to
> take those three and average.
>
> Could throw in Black Duck or other non-transparent surveys as a fourth,
> fifth, etc.?
>
> new versions? If a new version exists but isn't widely adopted yet, how does
> the list reflect that? e.g., MPL 1.1 still shows up in Black Duck's survey;
> should OSI replace 1.1 with 2.0 in the "processed" list? What about GPL v2
> v. v3? BSD/MIT v. UPL?
> gaps/"mistakes": What happens when the board thinks the data is incorrect?
> :) e.g., should ISC be listed?
>
> Part of why we didn't go very far in 2013 is because there are no great
> answers for these - different answers will reflect different values, and
> have different engineering impact. They're all hard choices for the board,
> the developers, hopefully license-discuss, and perhaps a broader community.
>
> Hat tip: Daniel German was invaluable to me in thinking through these
> questions.
>
> Supplementing with high-quality, value-adding options
>
> To encourage progress, while still avoiding proliferation, I'd suggest a
> second list of licenses that are good but not (yet?) popular. "Good" would
> be defined as something like:
>
> meets the OSD
> isn't on the data-driven popularity list
> drafted by an attorney (at minimum) or by a collaborative, public drafting
> process with clear support from a sponsoring-maintaining organization
> (ideal)
> has a new "feature" that is firmly in keeping with the overall goals of open
> source and can be concisely explained in a few sentences (e.g., for UPL,
> "GPL-compatible permissive license with explicit patent grant")
>
> but not "just for a particular community" - has to be at least plausible
> applicable to most open source projects
> this is unavoidably subjective; suggest having it fall to the board with
> pre-discussion on license-review.
>
> #4 allows for some innovation (and OSI support of such innovation) while #3
> applies a quality filter. (Both #3 and #4 have anti-proliferation effects.)
> Hopefully licenses that meet #3 and #4 would eventually move into #2, but
> you could imagine placing a time limit on this list; if you're not in the
> top 10 most popular within five years, then you get retired? But not sure
> that's a good idea at all - just throwing it out as one option.
>
> If a new license meets #1, but not #3 and #4, then OSI's formal policy
> should be to approve, but bury it in one of the other proliferation list
> groups. (Those groups are actually quite good, and should be fairly
> non-controversial — once you have a good policy for what gets in the more
> "favored" groups.) I don't think a new "deprecated" group is necessary - the
> proliferation categories are basically a good list of that already.
>
> This is still a somewhat subjective process, and if it had been in place in
> '99-'06, it would have been fairly fraught. However, I think most of the
> "action" in open source organization has moved on to other areas (e.g.,
> foundation structure, CoCs, etc.), and the field has matured in other ways,
> so I think this is now a practicable approach in ways it would not have been
> a decade or even five years ago.
>
> Miscellaneous notes
>
> I don't recommend merely updating the existing "popular and..." list through
> a subjective or one-time process. The politics of that will be messy, and
> without a documented, mostly-objective, data-driven method, it'll again
> become an outdated mess.
> The OSD should probably be updated. At the least this should be by
> addressing things like whether a formal patent grant is required of new
> licenses; more ambitiously it might follow Open Data Definition 2.x by
> splitting out open licenses from open works.
> With SPDX and Fedora providing more comprehensive lists of FOSS licenses, it
> might make sense for OSI to link to those as "extended" resources, to reduce
> pressure from obscure license authors to get their license approved.
> The biggest pressure on this process will continue to be licenses that try
> to open up space for new commercial business models (e.g., Fair Source). The
> more OSI can write/document/buttress OSD #1, the better.
> I used to think a license wizard was a good idea, but I don't any more. I
> thought copyleft spectrum was really the only important decision-making
> factor, which made the idea plausible, but non-copyleft factors matter much
> more than I once thought, and make simplifying to a "wizard" too hard for
> OSI (though perhaps still plausible for a third party).
> Documentation of what the copyleft spectrum is, what the key licenses on it
> are, and what other factors might be relevant, is still a good idea, but are
> secondary to getting the basic lists right.
>
> HTH-
>
> Luis
>
>
> _______________________________________________
> License-discuss mailing list
> License-discuss at opensource.org
> https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss
>

-- 
henrik.ingo at avoinelama.fi
+358-40-5697354        skype: henrik.ingo            irc: hingo
www.openlife.cc

My LinkedIn profile: http://fi.linkedin.com/pub/henrik-ingo/3/232/8a7