[License-discuss] notes on a systematic approach to "popular" licenses

Sun Apr 9 21:54:54 UTC 2017

On Sun, Apr 9, 2017 at 9:20 PM, Luis Villa <luis at lu.is> wrote:
> What's the "right" level to scan at? Top-level project-declared LICENSE
> file? Or per-file throughout the tree? (Note that often those two measures
> don't agree with each other.)

MO is that the right level is scan at both levels and if needed surface any
inconsistencies or contradictions. Scanning only the simpler top-level
project-declared LICENSE or COPYING file is not enough and too often
incomplete or inaccurate data based on my experience at scale.

That said, I am the maintainer of the open source ScanCode toolkit, a
fresh take to build a better mousetrap for license scanning:

https://github.com/nexB/scancode-toolkit

My goal is simple:
I want the licensing of every open source code to be a problem solved.
Not a question mark. e.g. working towards 100% licensing clarity and
eventually ensure that no piece of existing open source code raises
questions wrt. licensing to a user or aspiring user.

For that I would like to scan it **all**... and setup some community peer
review site so we can help every open source project add, refine or cleanup
any missing, incomplete, inaccurate or contradicting licensing. Or at least
make the data open and available for anyone to query otherwise.

The main drag is as always resource availability (as in both human time,
network , bandwidth and computing power) to fetch and scan everything from
every package managers, forge, Sourceforge, Github, etc which represents
a significant[sic] number of terabytes.
This could become a lesser issue on the fetch side when softwareheritage.org
is fully operational. But still.

If anyone is interested by this, please contact me!
-- 
Cordially
Philippe Ombredanne