[License-discuss] notes on a systematic approach to "popular" licenses

Luis Villa luis at lu.is
Tue Jan 10 16:07:53 UTC 2017


[Apparently I got unsubscribed at some point, so if you've sent an email
here in recent months seeking my feedback, please resend.]

Hey, all-
I promised some board members a summary of my investigation in '12-'13 into
updating, supplementing, or replacing the "popular licenses" list. Here
goes.


*tl;dr*
I think OSI should have an data-driven short license list with a replicable
and transparent methodology, supplemented by a new-and-good(?) list that
captures licenses that aren't yet popular but are high quality and have
some substantial improvement that advances the goals of OSI.


*Purposes of non-comprehensive lists*
If you Google "open source licenses", OSI pages are the top two hits.
Historically, those pages were not very helpful unless you already knew
something about open source. Having a shorter "top" list can help make the
OSI website more useful to newcomers by suggesting a starting place for
their exploration and education about open source.

In addition, third parties often look to OSI as a trusted (neutral?) source
for "top" or "best" licenses that they can incorporate into products. (The
full OSI-approved list is not practical for many applications.) For
example, if OSI had an up-to-date short list, it might have been the basis
for GitHub's license chooser.

A list that is purely based on popularity would freeze open source in a
particular time, likely making it hard for new licenses with important
innovations to get adoption. However, a list based on more subjective
criteria is hard to create and update.

*Past attempts*

The proliferation report attempted to address this problem by categorizing
existing licenses. These categories were, intentionally or not, seen as the
"popular or strong communities list" and "everything else". Without a
process or clear set of criteria to update the "popular" list, however, it
became frozen in time. It is now difficult to credibly recommend the list
to newcomers or third parties (MPL 1.1 is deprecated; no mention of
Blackduck #4 GPL v3; etc.).

There was also substantial work done towards a license "chooser" or
"wizard". However, this runs into some of the same problems - either the
chooser is opinionated (and so pisses off people, and potentially locks the
licenses in time) or is borderline-useless for newcomers (because it still
requires substantial additional research after using it).

*Data-driven "popular" list*

With all that in mind, I think that OSI needs a (mostly) data-driven
"popular" shortlist, based on a scan of public code + application of
(mostly?) objective rules to the outcome of that scan.

To maintain OSI's reputation as being (reasonably) neutral and independent,
OSI should probably avoid basing this on third-party license surveys
(e.g., Black
Duck <https://www.blackducksoftware.com/top-open-source-licenses>) unless
their methodologies and data sources are well-documented. Ideally someone
will write code so that the "survey" can be run by OSI and reproduced by
others.

Hard decisions on how to collect and "process" the data will include:

   - *choice of data sources:* What data sources are drawn on? Key Linux
   distros? GitHub? per-language repos like maven, cpan, npm, etc?
   - *what are you counting?* Projects? (May favor small, throwaway
   projects?) Lines of code? (May favor the largest, most complex projects?)
   ... ?
   - *which license tools? *Some scanners are more aggressive in trying to
   identify *something*, while others prefer accuracy over
   comprehensiveness. In 2013 there was no good answer to this, but my
   understanding is that fossology now has three different scanners, so for
   OSI's purposes it may be sufficient to take those three and average.
   - Could throw in Black Duck or other non-transparent surveys as a
      fourth, fifth, etc.?
      - *new versions? *If a new version exists but isn't widely adopted
   yet, how does the list reflect that? e.g., MPL 1.1 still shows up in Black
   Duck's survey; should OSI replace 1.1 with 2.0 in the "processed" list?
   What about GPL v2 v. v3? BSD/MIT v. UPL?
   - *gaps/"mistakes":* What happens when the board thinks the data is
   incorrect? :) e.g., should ISC be listed?

Part of why we didn't go very far in 2013 is because there are no great
answers for these - different answers will reflect different values, and
have different engineering impact. They're all hard choices for the board,
the developers, hopefully license-discuss, and perhaps a broader community.

Hat tip: Daniel German was invaluable to me in thinking through these
questions.

*Supplementing with high-quality, value-adding options*
To encourage progress, while still avoiding proliferation, I'd suggest a
second list of licenses that are good but not (yet?) popular. "Good" would
be defined as something like:

   1. meets the OSD
   2. isn't on the data-driven popularity list
   3. drafted by an attorney (at minimum) or by a collaborative, public
   drafting process with clear support from a sponsoring-maintaining
   organization (ideal)
   4. has a new "feature" that is firmly in keeping with the overall goals
   of open source and can be concisely explained in a few sentences (e.g., for
   UPL, "GPL-compatible permissive license with explicit patent grant")
   1. but not "just for a particular community" - has to be at least
      plausible applicable to most open source projects
      2. this is unavoidably subjective; suggest having it fall to the
      board with pre-discussion on license-review.

#4 allows for some innovation (and OSI support of such innovation) while #3
applies a quality filter. (Both #3 and #4 have anti-proliferation effects.)
Hopefully licenses that meet #3 and #4 would eventually move into #2, but
you could imagine placing a time limit on this list; if you're not in the
top 10 most popular within five years, then you get retired? But not sure
that's a good idea at all - just throwing it out as one option.

If a new license meets #1, but not #3 and #4, then OSI's formal policy
should be to approve, but bury it in one of the other proliferation list
groups. (Those groups are actually quite good, and should be fairly
non-controversial — once you have a good policy for what gets in the more
"favored" groups.) I don't think a new "deprecated" group is necessary -
the proliferation categories are basically a good list of that already.

This is still a somewhat subjective process, and if it had been in place in
'99-'06, it would have been fairly fraught. However, I think most of the
"action" in open source organization has moved on to other areas (e.g.,
foundation structure, CoCs, etc.), and the field has matured in other ways,
so I think this is now a practicable approach in ways it would not have
been a decade or even five years ago.

*Miscellaneous notes*

   - I don't recommend merely updating the existing "popular and..." list
   through a subjective or one-time process. The politics of that will be
   messy, and without a documented, mostly-objective, data-driven method,
   it'll again become an outdated mess.
   - The OSD should probably be updated. At the least this should be by
   addressing things like whether a formal patent grant is required of new
   licenses; more ambitiously it might follow Open Data Definition 2.x
   <http://opendefinition.org/od/2.1/en/> by splitting out open licenses
   from open works.
   - With SPDX and Fedora providing more comprehensive lists of FOSS
   licenses, it might make sense for OSI to link to those as "extended"
   resources, to reduce pressure from obscure license authors to get their
   license approved.
   - The biggest pressure on this process will continue to be licenses that
   try to open up space for new commercial business models (e.g., Fair
   Source). The more OSI can write/document/buttress OSD #1, the better.
   - I used to think a license wizard was a good idea, but I don't any
   more. I thought copyleft spectrum was really the only important
   decision-making factor, which made the idea plausible, but non-copyleft
   factors matter much more than I once thought, and make simplifying to a
   "wizard" too hard for OSI (though perhaps still plausible for a third
   party).
   - Documentation of what the copyleft spectrum *is*, what the key
   licenses on it are, and what other factors might be relevant, is still a
   good idea, but are secondary to getting the basic lists right.

HTH-

Luis
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensource.org/pipermail/license-discuss_lists.opensource.org/attachments/20170110/f65965a4/attachment.html>


More information about the License-discuss mailing list