[License-discuss] notes on a systematic approach to "popular" licenses

Richard Fontana fontana at sharpeleven.org
Thu Apr 6 15:50:30 UTC 2017

Interesting but at first glance the data seems too unreliable to be of
any use. I started checking the identified projects under the so-called
Clear BSD license (the FSF-free, never-OSI-submitted BSD variant that
explicitly excludes patent licenses) and the ones I looked at were all
spurious matches.


On Thu, Apr 6, 2017, at 11:21 AM, Luis Villa wrote:

> Yet another (inevitably flawed) data set: 

> https://libraries.io/licenses


> On Tue, Jan 10, 2017, 11:07 AM Luis Villa <luis at lu.is> wrote:

>> [Apparently I got unsubscribed at some point, so if you've sent an
>> email here in recent months seeking my feedback, please resend.]

>> Hey, all-

>> I promised some board members a summary of my investigation in '12-
>> '13 into updating, supplementing, or replacing the "popular licenses"
>> list. Here goes.

>> *tl;dr*

>> I think OSI should have an data-driven short license list with a
>> replicable and transparent methodology, supplemented by a new-and-
>> good(?) list that captures licenses that aren't yet popular but are
>> high quality and have some substantial improvement that advances the
>> goals of OSI.

>> *Purposes of non-comprehensive lists*

>> If you Google "open source licenses", OSI pages are the top two hits.
>> Historically, those pages were not very helpful unless you already
>> knew something about open source. Having a shorter "top" list can
>> help make the OSI website more useful to newcomers by suggesting a
>> starting place for their exploration and education about open source.

>> In addition, third parties often look to OSI as a trusted (neutral?)
>> source for "top" or "best" licenses that they can incorporate into
>> products. (The full OSI-approved list is not practical for many
>> applications.) For example, if OSI had an up-to-date short list, it
>> might have been the basis for GitHub's license chooser.
>> A list that is purely based on popularity would freeze open source in
>> a particular time, likely making it hard for new licenses with
>> important innovations to get adoption. However, a list based on more
>> subjective criteria is hard to create and update.
>> *Past attempts*

>> The proliferation report attempted to address this problem by
>> categorizing existing licenses. These categories were,
>> intentionally or not, seen as the "popular or strong communities
>> list" and "everything else". Without a process or clear set of
>> criteria to update the "popular" list, however, it became frozen in
>> time. It is now difficult to credibly recommend the list to
>> newcomers or third parties (MPL 1.1 is deprecated; no mention of
>> Blackduck #4 GPL v3; etc.).
>> There was also substantial work done towards a license "chooser" or
>> "wizard". However, this runs into some of the same problems - either
>> the chooser is opinionated (and so pisses off people, and potentially
>> locks the licenses in time) or is borderline-useless for newcomers
>> (because it still requires substantial additional research after
>> using it).
>> *Data-driven "popular" list*

>> With all that in mind, I think that OSI needs a (mostly) data-driven
>> "popular" shortlist, based on a scan of public code + application of
>> (mostly?) objective rules to the outcome of that scan.
>> To maintain OSI's reputation as being (reasonably) neutral and
>> independent, OSI should probably avoid basing this on third-party
>> license surveys (e.g., Black Duck[1]) unless their methodologies and
>> data sources are well-documented. Ideally someone will write code so
>> that the "survey" can be run by OSI and reproduced by others.
>> Hard decisions on how to collect and "process" the data will include:

>>  * *choice of data sources:* What data sources are drawn on? Key
>>    Linux distros? GitHub? per-language repos like maven, cpan, npm,
>>    etc?
>>  * *what are you counting?** *Projects? (May favor small, throwaway
>>    projects?) Lines of code? (May favor the largest, most complex
>>    projects?) ... ?
>>  * *which license tools? *Some scanners are more aggressive in trying
>>    to identify *something*, while others prefer accuracy over
>>    comprehensiveness. In 2013 there was no good answer to this, but
>>    my understanding is that fossology now has three different
>>    scanners, so for OSI's purposes it may be sufficient to take those
>>    three and average.
>>    * Could throw in Black Duck or other non-transparent surveys as a
>>      fourth, fifth, etc.?
>>  * *new versions? *If a new version exists but isn't widely adopted
>>    yet, how does the list reflect that? e.g., MPL 1.1 still shows up
>>    in Black Duck's survey; should OSI replace 1.1 with 2.0 in the
>>    "processed" list? What about GPL v2 v. v3? BSD/MIT v. UPL?
>>  * *gaps/"mistakes":* What happens when the board thinks the data is
>>    incorrect? :) e.g., should ISC be listed?
>> Part of why we didn't go very far in 2013 is because there are no
>> great answers for these - different answers will reflect different
>> values, and have different engineering impact. They're all hard
>> choices for the board, the developers, hopefully license-discuss, and
>> perhaps a broader community.
>> Hat tip: Daniel German was invaluable to me in thinking through these
>> questions.
>> *Supplementing with high-quality, value-adding options*

>> To encourage progress, while still avoiding proliferation, I'd
>> suggest a second list of licenses that are good but not (yet?)
>> popular. "Good" would be defined as something like:
>>  1. meets the OSD
>>  2. isn't on the data-driven popularity list
>>  3. drafted by an attorney (at minimum) or by a collaborative, public
>>     drafting process with clear support from a sponsoring-maintaining
>>     organization (ideal)
>>  4. has a new "feature" that is firmly in keeping with the overall
>>     goals of open source and can be concisely explained in a few
>>     sentences (e.g., for UPL, "GPL-compatible permissive license with
>>     explicit patent grant")
>>    1. but not "just for a particular community" - has to be at least
>>       plausible applicable to most open source projects
>>    2. this is unavoidably subjective; suggest having it fall to the
>>       board with pre-discussion on license-review.
>> #4 allows for some innovation (and OSI support of such innovation)
>> #while #3 applies a quality filter. (Both #3 and #4 have anti-
>> #proliferation effects.) Hopefully licenses that meet #3 and #4 would
>> #eventually move into #2, but you could imagine placing a time limit
>> #on this list; if you're not in the top 10 most popular within five
>> #years, then you get retired? But not sure that's a good idea at all
>> #- just throwing it out as one option.
>> If a new license meets #1, but not #3 and #4, then OSI's formal
>> policy should be to approve, but bury it in one of the other
>> proliferation list groups. (Those groups are actually quite good, and
>> should be fairly non-controversial — once you have a good policy for
>> what gets in the more "favored" groups.) I don't think a new
>> "deprecated" group is necessary - the proliferation categories are
>> basically a good list of that already.
>> This is still a somewhat subjective process, and if it had been in
>> place in '99-'06, it would have been fairly fraught. However, I
>> think most of the "action" in open source organization has moved on
>> to other areas (e.g., foundation structure, CoCs, etc.), and the
>> field has matured in other ways, so I think this is now a
>> practicable approach in ways it would not have been a decade or even
>> five years ago.
>> *Miscellaneous notes*

>>  * I don't recommend merely updating the existing "popular and..."
>>    list through a subjective or one-time process. The politics of
>>    that will be messy, and without a documented, mostly-objective,
>>    data-driven method, it'll again become an outdated mess.
>>  * The OSD should probably be updated. At the least this should be by
>>    addressing things like whether a formal patent grant is required
>>    of new licenses; more ambitiously it might follow Open Data
>>    Definition 2.x[2] by splitting out open licenses from open works.
>>  * With SPDX and Fedora providing more comprehensive lists of FOSS
>>    licenses, it might make sense for OSI to link to those as
>>    "extended" resources, to reduce pressure from obscure license
>>    authors to get their license approved.
>>  * The biggest pressure on this process will continue to be licenses
>>    that try to open up space for new commercial business models
>>    (e.g., Fair Source). The more OSI can write/document/buttress OSD
>>    #1, the better.
>>  * I used to think a license wizard was a good idea, but I don't any
>>    more. I thought copyleft spectrum was really the only important
>>    decision-making factor, which made the idea plausible, but non-
>>    copyleft factors matter much more than I once thought, and make
>>    simplifying to a "wizard" too hard for OSI (though perhaps still
>>    plausible for a third party).
>>  * Documentation of what the copyleft spectrum *is*, what the key
>>    licenses on it are, and what other factors might be relevant, is
>>    still a good idea, but are secondary to getting the basic lists
>>    right.
>> HTH-

>> Luis

> -- 

> *Luis Villa: Open Law and Strategy[3]*

> *+1-415-938-4552*

> _________________________________________________

> License-discuss mailing list

> License-discuss at opensource.org

> https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss


  1. https://www.blackducksoftware.com/top-open-source-licenses
  2. http://opendefinition.org/od/2.1/en/
  3. http://lu.is
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensource.org/pipermail/license-discuss_lists.opensource.org/attachments/20170406/65c01ed7/attachment.html>

More information about the License-discuss mailing list