[License-discuss] notes on a systematic approach to "popular" licenses

Sun Apr 9 19:20:02 UTC 2017

On Sun, Apr 9, 2017 at 11:57 AM Philippe Ombredanne <pombredanne at nexb.com>
wrote:

> > On Thu, Apr 6, 2017 at 6:19 PM Philippe Ombredanne <pombredanne at nexb.com
> >
> > wrote:
> >>
> >> On Thu, Apr 6, 2017 at 5:21 PM, Luis Villa <luis at lu.is> wrote:
> >> > On Tue, Jan 10, 2017, 11:07 AM Luis Villa <luis at lu.is> wrote:
> >> >>
> >> >> Hey, all-
> >> >> I promised some board members a summary of my investigation in
> '12-'13
> >> >> into updating, supplementing, or replacing the "popular licenses"
> list.
> >> >> Here
> >> >> goes.
> >> [...]
> >> > Yet another (inevitably flawed) data set:
> >> > https://libraries.io/licenses
> >>
> >> With the merit that the all the underlying code is FLOSS.
> >>
> >> Another possible source --always biased-- could be Debian's popcon and
> >> some cross ref with debsources.
>
>
> On Fri, Apr 7, 2017 at 11:54 AM, Andrew Nesbitt <andrew at libraries.io>
> wrote:
> > "inevitably flawed", would be great to get some feedback on how/why it's
> > flawed so I can improve it?
> >
> > System level package managers are in the pipeline for the end of the
> year,
> > but there are so fewer packages there that I can't see it moving the
> needle
> > much
>
> Andrew: my comment on "inevitably flawed" was to echo Luis point that any
> open source  license popularity contest is likely to be flawed and biased
> one
> way or another regardless of the data set that is considered as a basis.
>

Hi, Andrew-
For some reason your email never made it through to me; just saw Philippe's
response coming through.

I added "inevitably flawed" as a shorthand to fend off the basic critiques
that always accompany mentions of surveys on this list. The primary
critiques are:

   - What's the "right" set of data sources to draw from? By deciding to
   include (or leave out) any particular repo, you inevitably impact license
   popularity, and also inevitably you can't include them all.
   - What's the "right" metric for popularity? projects? files? LOCs? usage
   of projects? For example, if two projects are of the same complexity, but
   one is widely used and the other hardly used at all, should they count the
   same? What if one is very simple, the other very complex?
      - What about unmaintained/old code?
      - What's the "right" level to scan at? Top-level project-declared
   LICENSE file? Or per-file throughout the tree? (Note that often those two
   measures don't agree with each other.)

I feel that there are no right or wrong answers to these questions;
different surveys have different purposes. But others disagree: every time
we discuss this subject here, someone pops up and says "no, this service
does it wrong, they should do X instead". Because no service can please
everyone, I know yours displeases someone :)

[There is also a question as to whether or not proprietary methodologies
should be completely ignored, or taken as another data point with an
appropriate grain of salt. I don't think that question is possible to
answer for OSI yet, but it probably does have right/wrong answers.]

Hope that clarifies where I was coming from- would be happy to chat
whenever if this doesn't.

Luis
-- 

*Luis Villa: Open Law and Strategy <http://lu.is>*
*+1-415-938-4552*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensource.org/pipermail/license-discuss_lists.opensource.org/attachments/20170409/71b15dc5/attachment.html>