[License-discuss] notes on a systematic approach to "popular" licenses
Richard Fontana
fontana at sharpeleven.org
Thu Apr 6 15:50:30 UTC 2017
Interesting but at first glance the data seems too unreliable to be of
any use. I started checking the identified projects under the so-called
Clear BSD license (the FSF-free, never-OSI-submitted BSD variant that
explicitly excludes patent licenses) and the ones I looked at were all
spurious matches.
Richard
On Thu, Apr 6, 2017, at 11:21 AM, Luis Villa wrote:
> Yet another (inevitably flawed) data set:
> https://libraries.io/licenses
>
> On Tue, Jan 10, 2017, 11:07 AM Luis Villa <luis at lu.is> wrote:
>> [Apparently I got unsubscribed at some point, so if you've sent an
>> email here in recent months seeking my feedback, please resend.]
>>
>> Hey, all-
>> I promised some board members a summary of my investigation in '12-
>> '13 into updating, supplementing, or replacing the "popular licenses"
>> list. Here goes.
>>
>> *tl;dr*
>> I think OSI should have an data-driven short license list with a
>> replicable and transparent methodology, supplemented by a new-and-
>> good(?) list that captures licenses that aren't yet popular but are
>> high quality and have some substantial improvement that advances the
>> goals of OSI.
>>
>> *Purposes of non-comprehensive lists*
>> If you Google "open source licenses", OSI pages are the top two hits.
>> Historically, those pages were not very helpful unless you already
>> knew something about open source. Having a shorter "top" list can
>> help make the OSI website more useful to newcomers by suggesting a
>> starting place for their exploration and education about open source.
>>
>> In addition, third parties often look to OSI as a trusted (neutral?)
>> source for "top" or "best" licenses that they can incorporate into
>> products. (The full OSI-approved list is not practical for many
>> applications.) For example, if OSI had an up-to-date short list, it
>> might have been the basis for GitHub's license chooser.
>> A list that is purely based on popularity would freeze open source in
>> a particular time, likely making it hard for new licenses with
>> important innovations to get adoption. However, a list based on more
>> subjective criteria is hard to create and update.
>> *Past attempts*
>> The proliferation report attempted to address this problem by
>> categorizing existing licenses. These categories were,
>> intentionally or not, seen as the "popular or strong communities
>> list" and "everything else". Without a process or clear set of
>> criteria to update the "popular" list, however, it became frozen in
>> time. It is now difficult to credibly recommend the list to
>> newcomers or third parties (MPL 1.1 is deprecated; no mention of
>> Blackduck #4 GPL v3; etc.).
>> There was also substantial work done towards a license "chooser" or
>> "wizard". However, this runs into some of the same problems - either
>> the chooser is opinionated (and so pisses off people, and potentially
>> locks the licenses in time) or is borderline-useless for newcomers
>> (because it still requires substantial additional research after
>> using it).
>> *Data-driven "popular" list*
>> With all that in mind, I think that OSI needs a (mostly) data-driven
>> "popular" shortlist, based on a scan of public code + application of
>> (mostly?) objective rules to the outcome of that scan.
>> To maintain OSI's reputation as being (reasonably) neutral and
>> independent, OSI should probably avoid basing this on third-party
>> license surveys (e.g., Black Duck[1]) unless their methodologies and
>> data sources are well-documented. Ideally someone will write code so
>> that the "survey" can be run by OSI and reproduced by others.
>> Hard decisions on how to collect and "process" the data will include:
>> * *choice of data sources:* What data sources are drawn on? Key
>> Linux distros? GitHub? per-language repos like maven, cpan, npm,
>> etc?
>> * *what are you counting?** *Projects? (May favor small, throwaway
>> projects?) Lines of code? (May favor the largest, most complex
>> projects?) ... ?
>> * *which license tools? *Some scanners are more aggressive in trying
>> to identify *something*, while others prefer accuracy over
>> comprehensiveness. In 2013 there was no good answer to this, but
>> my understanding is that fossology now has three different
>> scanners, so for OSI's purposes it may be sufficient to take those
>> three and average.
>> * Could throw in Black Duck or other non-transparent surveys as a
>> fourth, fifth, etc.?
>> * *new versions? *If a new version exists but isn't widely adopted
>> yet, how does the list reflect that? e.g., MPL 1.1 still shows up
>> in Black Duck's survey; should OSI replace 1.1 with 2.0 in the
>> "processed" list? What about GPL v2 v. v3? BSD/MIT v. UPL?
>> * *gaps/"mistakes":* What happens when the board thinks the data is
>> incorrect? :) e.g., should ISC be listed?
>> Part of why we didn't go very far in 2013 is because there are no
>> great answers for these - different answers will reflect different
>> values, and have different engineering impact. They're all hard
>> choices for the board, the developers, hopefully license-discuss, and
>> perhaps a broader community.
>> Hat tip: Daniel German was invaluable to me in thinking through these
>> questions.
>> *Supplementing with high-quality, value-adding options*
>> To encourage progress, while still avoiding proliferation, I'd
>> suggest a second list of licenses that are good but not (yet?)
>> popular. "Good" would be defined as something like:
>> 1. meets the OSD
>> 2. isn't on the data-driven popularity list
>> 3. drafted by an attorney (at minimum) or by a collaborative, public
>> drafting process with clear support from a sponsoring-maintaining
>> organization (ideal)
>> 4. has a new "feature" that is firmly in keeping with the overall
>> goals of open source and can be concisely explained in a few
>> sentences (e.g., for UPL, "GPL-compatible permissive license with
>> explicit patent grant")
>> 1. but not "just for a particular community" - has to be at least
>> plausible applicable to most open source projects
>> 2. this is unavoidably subjective; suggest having it fall to the
>> board with pre-discussion on license-review.
>> #4 allows for some innovation (and OSI support of such innovation)
>> #while #3 applies a quality filter. (Both #3 and #4 have anti-
>> #proliferation effects.) Hopefully licenses that meet #3 and #4 would
>> #eventually move into #2, but you could imagine placing a time limit
>> #on this list; if you're not in the top 10 most popular within five
>> #years, then you get retired? But not sure that's a good idea at all
>> #- just throwing it out as one option.
>> If a new license meets #1, but not #3 and #4, then OSI's formal
>> policy should be to approve, but bury it in one of the other
>> proliferation list groups. (Those groups are actually quite good, and
>> should be fairly non-controversial — once you have a good policy for
>> what gets in the more "favored" groups.) I don't think a new
>> "deprecated" group is necessary - the proliferation categories are
>> basically a good list of that already.
>> This is still a somewhat subjective process, and if it had been in
>> place in '99-'06, it would have been fairly fraught. However, I
>> think most of the "action" in open source organization has moved on
>> to other areas (e.g., foundation structure, CoCs, etc.), and the
>> field has matured in other ways, so I think this is now a
>> practicable approach in ways it would not have been a decade or even
>> five years ago.
>> *Miscellaneous notes*
>> * I don't recommend merely updating the existing "popular and..."
>> list through a subjective or one-time process. The politics of
>> that will be messy, and without a documented, mostly-objective,
>> data-driven method, it'll again become an outdated mess.
>> * The OSD should probably be updated. At the least this should be by
>> addressing things like whether a formal patent grant is required
>> of new licenses; more ambitiously it might follow Open Data
>> Definition 2.x[2] by splitting out open licenses from open works.
>> * With SPDX and Fedora providing more comprehensive lists of FOSS
>> licenses, it might make sense for OSI to link to those as
>> "extended" resources, to reduce pressure from obscure license
>> authors to get their license approved.
>> * The biggest pressure on this process will continue to be licenses
>> that try to open up space for new commercial business models
>> (e.g., Fair Source). The more OSI can write/document/buttress OSD
>> #1, the better.
>> * I used to think a license wizard was a good idea, but I don't any
>> more. I thought copyleft spectrum was really the only important
>> decision-making factor, which made the idea plausible, but non-
>> copyleft factors matter much more than I once thought, and make
>> simplifying to a "wizard" too hard for OSI (though perhaps still
>> plausible for a third party).
>> * Documentation of what the copyleft spectrum *is*, what the key
>> licenses on it are, and what other factors might be relevant, is
>> still a good idea, but are secondary to getting the basic lists
>> right.
>> HTH-
>> Luis
> --
> *Luis Villa: Open Law and Strategy[3]*
> *+1-415-938-4552*
> _________________________________________________
> License-discuss mailing list
> License-discuss at opensource.org
> https://lists.opensource.org/cgi-bin/mailman/listinfo/license-discuss
Links:
1. https://www.blackducksoftware.com/top-open-source-licenses
2. http://opendefinition.org/od/2.1/en/
3. http://lu.is
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensource.org/pipermail/license-discuss_lists.opensource.org/attachments/20170406/65c01ed7/attachment.html>
More information about the License-discuss
mailing list