[License-discuss] Backfilling Mailman archive gaps (was: Columbia S&T Law Review analysis of the OSI license-discuss mailing list)

Rick Moen rick at linuxmafia.com
Sun Mar 1 01:26:48 UTC 2020


Quoting McCoy Smith (mccoy at lexpan.law):

> License-approval only goes back to December, 2007; license-discuss
> goes back to 1999, but as far as I can tell doesn’t include complete
> discussion about approvals of licenses from 1999-2007 (those
> discussions are on now-dead links to Russ Nelson’s private webpage).
> I think a complete examination of the history of license approvals
> (particular around the ones that were put early on the list) would
> require that data.  I’ve heard it exists somewhere, but not sure
> where.  Would be nice if it was publicly accessible.

That reminds me that I should (and now will) extend an offer, valid
indefinitely, to OSI to help merge in any archives that can be located
of earlier license-discuss and license-review traffic.

GNU Mailman bases each mailing list's 'pipermail' archive on a
cumulative file storing all received postings in receipt order, in Unix
mbox format.  Debian, as an example, would append license-discuss
postings to
/var/lib/mailman/archives/private/license-discuss.mbox/license-discuss.mbox
.  The archiver sub-process then refreshes from the mbox file the
derived contents of the HTML and ASCII public archives, to reflect the
new traffic.

Because of that structure, it's not particularly difficult to merge
missing traffic into a GNU Mailman mailing list archive, _provided_ one
can hammer those missing messages into one or more mbox file -- and then
combine the several mbox files as a shell operation, and finally
re-generate the public archive, thus:

# su - list
$ cd /var/lib/mailman
$ bin/arch --wipe license-discuss 
$ exit
# exit

There's a _little_ more to it, having to do with finding and escaping
message-body lines inside the mbox that start with flush-left '> From',
but otherwise this works well.  Of course, one tests the regeneration
_first_ on a scratch system, for caution's sake, and makes backups.

I used this technique to extend Silicon Valley Linux User Group's
mailing list archive _backward_ into history about a year before SVLUG's
1998 early-adoption of GNU Mailman:  Prior to that, SVLUG had used Brent
Chapman's (proprietary, Perl-based) 'majordomo' mailing list manager,
but, serendipitously, SVLUG had kept a copy of majordomo's cumulative mbox 
file.  Ergo, after some easy integration work, SVLUG's Mailman archive got
extended back to Sept. 24, 1997, long before SVLUG (or practically
anyone else) used Mailman in production.
(http://lists.svlug.org/archives/svlug/)

Because OSI no longer self-hosts Mailman, the final steps (above) would
need to be requested from OSI's hosting provider's technical staff, but,
with some luck, they would not refuse.  (Mindful of Pam's request to
please be nice, I'll not elaborate on what rightfully should be done
concerning hosting providers who refuse.  ;->  )

-- 
Cheers,        There's no theorem like Bayes's Theorem, like no theorem we know.
Rick Moen      Everything about it is appealing, everything about it is a wow.
rick at linux     Let out all that a-priori feeling, you've been concealing,
mafia.com      right up to now.   -- G.E.P. Box (w/apologies to Irving Berlin)



More information about the License-discuss mailing list