[License-review] For approval: The Cryptographic Autonomy License (Beta 4)

Wed Feb 12 20:18:40 UTC 2020

VanL writes:

> Hi Chris,
>
> Thanks for sharing your concerns.
>
> Let me start with your last point first:
>
>> Finally, I am going to repeat part of what I said in the cap-talk
>> thread: the choice of the license name is ironic, because though the
>> goals are good (ensure autonomy of users), it may actually create
>> difficulties in complying in the kinds of systems that actually are
>> doing so through cryptography and protocols (ocap actor model systems,
>> encrypted data vaults).  The problem is one that must be addressed for
>> the sake of user freedom, but I believe it is being addressed on the
>> incorrect layer.  Here we see an attempt to address it on the license
>> layer, but IMO this is wrong: there are active systems that exist and
>> are addressing this in such a way that the cryptographic tooling and
>> protocols themselves are the ones that help ensure greater user
>> autonomy.
>>
>
> You may be interested to know that I did not choose this name. It was
> specified by my client, whose founders have spent their professional life
> working on Cryptographic Autonomy - and this license was specifically
> designed in part to deal with issues that the technical layer can't handle
> well. The legal layer isn't sufficient, but neither is the technical layer.
> Code has bugs; protocols can be subverted. The license is a backstop and
> tool to stop an entirely different set of adversaries that are not well
> addressed by using code alone.

Could you specify what those problems are?

> Now, addressing your more-specific concerns, my response in brief is that
> you are incorrect as to the scope of what is required by the CAL.
>
> First, with regard to the source code requirement: The easy response is
> that there is nothing in the CAL that would require someone to generate new
> documentation. More generally, you are confusing configuration information
> generally with configuration information needed to install and use, which
> is the context of this provision.

The language of the section talks about use rather than execution, which
is broader.  Starting up and running a program is different than being
able to understand how to use its features.  I'd suggest adjusting the
language of the license to follow the GPL and say "execute" rather than
"use".

This is especially important since the CAL is giving the impression that
it is being developed so that an entity on the other end is given enough
of a toolkit to be able to comparatively run the same service.  So being
unambiguous here is a good idea, I think.

> In my talk last year I was making
>> an argument that we could see emacs lisp as being both configuration
>> information and code (but the lack of network requirement permitted
>> private modification).  Here that argument need not even be made since
>> the CAL *explicitly* requires configuration information.
>>
>> I'm definitely worried then about:
>>  - The introduction of a requirement for documentation for *use*
>>  - The introduction of a requirement for configuration information,
>>    *especially in the context of network distribution requirements*.
>>
>
> I... confess I don't follow your argument here. I am aware of the LISPy
> "code is data is configuration"-type context - but I am having trouble
> imagining a network-aware Emacs creating any burden at all. Even if you
> look at typical network interactions that have been built into Emacs (news
> reader, email, etc), you would have to evaluate whether your interaction
> via Emacs turned the other party into a Recipient by transmitting a
> licensable element to the other party. In all the cases I am aware of,
> Emacs would be a client (in the typical client-server sense). In the case
> where one party is acting as a client, the terms of the interaction are
> generally set by the party acting as the server in a particular network
> transaction. I am not sure how requesting information from a third party,
> on terms set by the third party, would involve the sharing of *your*
> copyrightable/licensable material. Accordingly, the third party is not a
> "Recipient" according to the terms of the CAL and your Emacs configuration
> is safe - and private to you.
>
> If you modified Emacs to be a server, and serve up information, then that
> would be something else - but in that case it is no different than any
> other server.

The AGPL makes no distinction between client and server, so a server
realistically can make similar requests of clients participating in a
network connection (though afaik nobody has done this).  To quote:

  Notwithstanding any other provision of this License, if you modify the
  Program, your modified version must prominently offer all users
  interacting with it remotely through a computer network (if your
  version supports such interaction) an opportunity to receive the
  Corresponding Source of your version by providing access to the
  Corresponding Source from a network server at no charge, through some
  standard or customary means of facilitating copying of software.

But most clients are not written under the AGPL so this doesn't really
come up.  If they were, the license would technically obligate an offer
of source from a client to a server.  (Due to the heirarchical nature of
client-to-server systems, this isn't something much thought about
either, I think.)  So I do think an emacs mail client connecting to a
mail server, if it were under the AGPL, could be compelled to release
its emacs configuration, including what mail filtering a user has
configured through emacs lisp.  By contrast, by being under the GPL,
there is no requirement for distribution on network communication, so
this "privacy vulnerability" does not exist.

For this reason, I do not recommend usage of the AGPL for clients,
particularly ones with no distinction between code and data.  At any
rate, again, in the kind of actor ocap systems I am talking about there
is no distinction between client and server either.  This is why I
discourage usage of the AGPL for the new kinds of actor-ocap systems I
work on, despite having used and advocated it in the past.

The AGPL made sense to me when I thought of the world in terms of
fighting the heirarchies imposed by Web 2.0 style client-server systems.
It does not make sense in peer-to-peer networked systems where this
hierarchy does not exist.

>> The use of the word "operator" tells me that the intended use of this
>> license has been seen within the scope of a client-server style
>> architecture, or has been flavored by the experience of seeing that
>> rolled out as the primary way we've seen networked applications in the
>> last couple of decades.
>>
>
> You are mistaken. The CAL was written specifically for a peer-to-peer
> application,

In that case, I think my above point applies strongly to the CAL as
well.  However, since the CAL explicitly calls out configuration and
encryption materials, the kind of "privacy attack" I have described is
more easily made, not just in the case of comparatively rare
code-and-data-blended systems.

> but was designed to be compatible with client-server interactions, as
> they are the most common. Nevertheless, even in a peer-to-peer
> interaction, you can still talk about a "Recipient" because each peer
> acts sometimes as a client and sometimes as a server, and those
> different roles give rise to different types of sharing (and thus
> different requirements).

That's true, though in an actor ocap style system the granularity of
such interactions is very small.  Does a recipient or user require a
human behind the wheel, or does it apply to every object in the system?

> In particular:
>>
>>  - It does not seem to anticipate that *every user* should be running
>>    their own application... not client-to-server, but peer-to-peer .[...]
>
>
>
>>   - Assume that object access is managed by capability security.  [...]
>
>
> I am familiar with actor-model ocap systems

Great, then I will not hesitate to go into a common actor model ocap
system usage pattern below.

> and this does not cause them any issue. The CAL is very careful to
> limit disclosure to the Recipient's user data; it does not require
> disclosure of the operator's private data.

How does it do this?  I see no clear guidelines that would help me
distinguish the kinds of programs I am working on today.

To use actor model language, say object A1 on Alice's phone is
communicating with object B1 on some other computer somewhere (could be
on another phone, could be on a VPS, doesn't matter much) which we will
say is run by Bob.  A1 invokes a method of B1, passing in reference to
A2, which also lives on Alice's phone.  To keep things simple, let's
assume A2 is a mutable cell, from which A1 holds a capability to revoke
access if it deems appropriate to B2.  In other words, Alice passed in a
small storage location on her phone to B1.

Since you are familiar with actor ocap systems, you will realize this
kind of pattern is extremely common and will often happen millions of
times a day, spanning many machines.

Now let's examine what happens.

 - Say B1 puts some interesting information of value in the mutable
   cell of A2.  So this is now used as a remote storage location.

 - Question: is B1 a recipient worthy of an obligation to provide remote
   data access?  You will observe that answering either way will give
   you problems:

   - If you answer no, then what I have described is the most abstract
     and minimal object storage system, and all object storage systems
     can be built upon such a storage system.  Therefore, the license
     does nothing with respect to user data at all.

   - If you say yes, several problems arise.
     - Full revocation of access to the cell cannot be provided.  If a
       process running on Bob's machine misbehaves (perhaps due to
       intentional or unintentional action of Bob), how can revocation
       of access occur?
     - Revoking write access while retaining read access to the old
       value is one potential answer, but is insufficient if the data is
       problematic or abusive.
     - What happens if Alice's phone is lost, stolen, her program
       breaks, etc... is Alice vulnerable to a legal attack by Bob for
       not being able to provide information?
     - At what time is garbage collection of unused cells permitted,
       especially when network access has not been established for some
       time?

 - If you are familiar enough with peer to peer actor-model ocap
   systems, you are then also familiar with how common being a recipient
   of fine-grained information access is.  With the kind of mutable cell
   description I have given above, are we expected to introduce
   extrapolation behaviour throughout every program?  Answering "no"
   means that an incomplete amount of information may be ble to be
   produced for a user to be able to get an equivalent experience
   running on their computer.  Answering "yes" can introduce two
   separate problems:

   - Figuring out how to do so can be an onerous amount of work, and
     is subject to the challenges already described above regarding
     revocation and garbage collection.

   - Doing so incorrectly could result in data leakage, particularly
     since users' data rarely live in isolation.  One only need look at
     how common "anonymized" data ends up actually being able to be
     tracked to a specific individual to see this.

>> In order to uphold the Principle of Least Authority, varied and attenuated
>> access is given out to many different objects in the system.  Which of them
>> are "users", and what rights for what data do they have to extract which
>> information?
>
> This comment shows a key misunderstanding: A user is a *person*, not an
> object that has been granted authority. This is one of the benefits of
> acting at the legal layer; we can apply legal reasoning to the
> people-actors, not the object-actors that live in the system.

Even if this is the case, the program is presumably acting on behalf of
a person or institution which does have legal standing.  Bob still can
presumably apply legal force against Alice for a program running on her
phone if some behavior of his is not retained if her phone goes in the
toilet.

We should be building our systems to have failsafes for systems like
this, rather than imposing additional legal constraints on everyday
users who may not be able to comply, including not having the technical
expertise to know how to do so.

An encrypted content-addressed system is a good example of an
alternative approach: the data can live anywhere without the server
knowing what its contents are (barring some vulnerability in the
underlying cryptography).  Thus we can get around data-hoarding problems
-- users can always "move" their data location -- and backups are
possible.  Individual contractual relationships can indeed be set up
between hosting providers; I can pay someone to keep a Tahoe LAFS node
live with my content and sue them if they fail to comply.  But that does
not need to go into the software license itself.

>> This is not theoretical, this is the direction that sophisticated
>> peer-to-peer, cryptographically secure distributed programs are going (eg
>> the programming language E, the work that the
>>    Agoric company is doing, and my own work on Spritely Goblins).  While I
>> (and many other security researchers) believe this is the healthiest
>> direction for network security to go, I do not see a good
>>    way to be able to uphold this with the requirements for recipient data
>> distribution and key distribution.
>>
>
> I don't really think this is the place to dive deep into software
> architecture, so I'll just say that this reflects a misunderstanding of the
> requirements of the CAL. Least authority system or not, the analysis is:
>   1) Is there a Recipient - a human being that has received licensable
>      material?
>   2) Do I have the Recipient's User Data? Meaning, do I have data that a
>      User has a legal right to in my possession?
>   3) Is that User Data available to me? (For example, not encrypted beyond
>      my ability to identify it?)
>
> If all three of these questions are true, then you need to provide the
> Recipient's User Data back to the Recipient.

I believe I have already described why this is not so simple above.

> I will return again to the main point: I have a pretty good knowledge of,
> and deep respect for, the leading edge of capability-based systems. We
> definitely need further technical work in this area in order to provide
> better guarantees to users. But that doesn't mean that the legal layer is
> unnecessary! The legal layer is orthogonal to the technical layer, and each
> reinforces the other. It is a fundamental mistake to only look at one or
> the other.

I believe you are trying to do the right thing and "close the loop" for
the fairness of users, which is important work.  In that sense, we have
the same goal.  But I nonetheless also believe that software licenses
are not the right place to do it.  At any rate, even if they are,
unfortunately I don't see assurance that my concerns with this approach
are addressed.