[License-discuss] notes on a systematic approach to "popular" licenses

Luis Villa luis at lu.is
Thu Apr 6 15:21:38 UTC 2017


Yet another (inevitably flawed) data set:
https://libraries.io/licenses

On Tue, Jan 10, 2017, 11:07 AM Luis Villa <luis at lu.is> wrote:

> [Apparently I got unsubscribed at some point, so if you've sent an email
> here in recent months seeking my feedback, please resend.]
>
> Hey, all-
> I promised some board members a summary of my investigation in '12-'13
> into updating, supplementing, or replacing the "popular licenses" list.
> Here goes.
>
>
> *tl;dr*
> I think OSI should have an data-driven short license list with a
> replicable and transparent methodology, supplemented by a new-and-good(?)
> list that captures licenses that aren't yet popular but are high quality
> and have some substantial improvement that advances the goals of OSI.
>
>
> *Purposes of non-comprehensive lists*
> If you Google "open source licenses", OSI pages are the top two hits.
> Historically, those pages were not very helpful unless you already knew
> something about open source. Having a shorter "top" list can help make the
> OSI website more useful to newcomers by suggesting a starting place for
> their exploration and education about open source.
>
> In addition, third parties often look to OSI as a trusted (neutral?)
> source for "top" or "best" licenses that they can incorporate into
> products. (The full OSI-approved list is not practical for many
> applications.) For example, if OSI had an up-to-date short list, it might
> have been the basis for GitHub's license chooser.
>
> A list that is purely based on popularity would freeze open source in a
> particular time, likely making it hard for new licenses with important
> innovations to get adoption. However, a list based on more subjective
> criteria is hard to create and update.
>
> *Past attempts*
>
> The proliferation report attempted to address this problem by categorizing
> existing licenses. These categories were, intentionally or not, seen as the
> "popular or strong communities list" and "everything else". Without a
> process or clear set of criteria to update the "popular" list, however, it
> became frozen in time. It is now difficult to credibly recommend the list
> to newcomers or third parties (MPL 1.1 is deprecated; no mention of
> Blackduck #4 GPL v3; etc.).
>
> There was also substantial work done towards a license "chooser" or
> "wizard". However, this runs into some of the same problems - either the
> chooser is opinionated (and so pisses off people, and potentially locks the
> licenses in time) or is borderline-useless for newcomers (because it still
> requires substantial additional research after using it).
>
> *Data-driven "popular" list*
>
> With all that in mind, I think that OSI needs a (mostly) data-driven
> "popular" shortlist, based on a scan of public code + application of
> (mostly?) objective rules to the outcome of that scan.
>
> To maintain OSI's reputation as being (reasonably) neutral and
> independent, OSI should probably avoid basing this on third-party license
> surveys (e.g., Black Duck
> <https://www.blackducksoftware.com/top-open-source-licenses>) unless
> their methodologies and data sources are well-documented. Ideally someone
> will write code so that the "survey" can be run by OSI and reproduced by
> others.
>
> Hard decisions on how to collect and "process" the data will include:
>
>    - *choice of data sources:* What data sources are drawn on? Key Linux
>    distros? GitHub? per-language repos like maven, cpan, npm, etc?
>    - *what are you counting?* Projects? (May favor small, throwaway
>    projects?) Lines of code? (May favor the largest, most complex projects?)
>    ... ?
>    - *which license tools? *Some scanners are more aggressive in trying
>    to identify *something*, while others prefer accuracy over
>    comprehensiveness. In 2013 there was no good answer to this, but my
>    understanding is that fossology now has three different scanners, so for
>    OSI's purposes it may be sufficient to take those three and average.
>    - Could throw in Black Duck or other non-transparent surveys as a
>       fourth, fifth, etc.?
>       - *new versions? *If a new version exists but isn't widely adopted
>    yet, how does the list reflect that? e.g., MPL 1.1 still shows up in Black
>    Duck's survey; should OSI replace 1.1 with 2.0 in the "processed" list?
>    What about GPL v2 v. v3? BSD/MIT v. UPL?
>    - *gaps/"mistakes":* What happens when the board thinks the data is
>    incorrect? :) e.g., should ISC be listed?
>
> Part of why we didn't go very far in 2013 is because there are no great
> answers for these - different answers will reflect different values, and
> have different engineering impact. They're all hard choices for the board,
> the developers, hopefully license-discuss, and perhaps a broader community.
>
> Hat tip: Daniel German was invaluable to me in thinking through these
> questions.
>
> *Supplementing with high-quality, value-adding options*
> To encourage progress, while still avoiding proliferation, I'd suggest a
> second list of licenses that are good but not (yet?) popular. "Good" would
> be defined as something like:
>
>    1. meets the OSD
>    2. isn't on the data-driven popularity list
>    3. drafted by an attorney (at minimum) or by a collaborative, public
>    drafting process with clear support from a sponsoring-maintaining
>    organization (ideal)
>    4. has a new "feature" that is firmly in keeping with the overall
>    goals of open source and can be concisely explained in a few sentences
>    (e.g., for UPL, "GPL-compatible permissive license with explicit patent
>    grant")
>    1. but not "just for a particular community" - has to be at least
>       plausible applicable to most open source projects
>       2. this is unavoidably subjective; suggest having it fall to the
>       board with pre-discussion on license-review.
>
> #4 allows for some innovation (and OSI support of such innovation) while
> #3 applies a quality filter. (Both #3 and #4 have anti-proliferation
> effects.) Hopefully licenses that meet #3 and #4 would eventually move into
> #2, but you could imagine placing a time limit on this list; if you're not
> in the top 10 most popular within five years, then you get retired? But not
> sure that's a good idea at all - just throwing it out as one option.
>
> If a new license meets #1, but not #3 and #4, then OSI's formal policy
> should be to approve, but bury it in one of the other proliferation list
> groups. (Those groups are actually quite good, and should be fairly
> non-controversial — once you have a good policy for what gets in the more
> "favored" groups.) I don't think a new "deprecated" group is necessary -
> the proliferation categories are basically a good list of that already.
>
> This is still a somewhat subjective process, and if it had been in place
> in '99-'06, it would have been fairly fraught. However, I think most of the
> "action" in open source organization has moved on to other areas (e.g.,
> foundation structure, CoCs, etc.), and the field has matured in other ways,
> so I think this is now a practicable approach in ways it would not have
> been a decade or even five years ago.
>
> *Miscellaneous notes*
>
>    - I don't recommend merely updating the existing "popular and..." list
>    through a subjective or one-time process. The politics of that will be
>    messy, and without a documented, mostly-objective, data-driven method,
>    it'll again become an outdated mess.
>    - The OSD should probably be updated. At the least this should be by
>    addressing things like whether a formal patent grant is required of new
>    licenses; more ambitiously it might follow Open Data Definition 2.x
>    <http://opendefinition.org/od/2.1/en/> by splitting out open licenses
>    from open works.
>    - With SPDX and Fedora providing more comprehensive lists of FOSS
>    licenses, it might make sense for OSI to link to those as "extended"
>    resources, to reduce pressure from obscure license authors to get their
>    license approved.
>    - The biggest pressure on this process will continue to be licenses
>    that try to open up space for new commercial business models (e.g., Fair
>    Source). The more OSI can write/document/buttress OSD #1, the better.
>    - I used to think a license wizard was a good idea, but I don't any
>    more. I thought copyleft spectrum was really the only important
>    decision-making factor, which made the idea plausible, but non-copyleft
>    factors matter much more than I once thought, and make simplifying to a
>    "wizard" too hard for OSI (though perhaps still plausible for a third
>    party).
>    - Documentation of what the copyleft spectrum *is*, what the key
>    licenses on it are, and what other factors might be relevant, is still a
>    good idea, but are secondary to getting the basic lists right.
>
> HTH-
>
> Luis
>
-- 

*Luis Villa: Open Law and Strategy <http://lu.is>*
*+1-415-938-4552*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensource.org/pipermail/license-discuss_lists.opensource.org/attachments/20170406/100d91c7/attachment.html>


More information about the License-discuss mailing list