[License-review] Please review revised ModelGo licenses

Carlo Piana carlo at piana.eu
Tue May 20 15:21:40 UTC 2025


Dear moming

First, I have converted this message to text only because the text on HTML was totally unreadable on white background. Can we please avoid using colors to mark answers, as it is bad practice and an usability monstrosity?

A few comments inline, on one single issue upon which I have commented and found no satisfactory answer.


----- Messaggio originale -----
> Da: "Moming Duan" <duanmoming at gmail.com>
> A: "License submissions for OSI review" <license-review at lists.opensource.org>
> Inviato: Martedì, 20 maggio 2025 4:34:58
> Oggetto: Re: [License-review] Please review revised ModelGo licenses

[...]

> Response to the Model Output Provision and the Editor Hypothetical:
> [R: I agree with Moming that the model output provision does not violate any of
> the 10 Open-Source Definition criteria. The OSI ’ s responses appear to be
> against this model output provision, mostly on the basis that it restricts the
> conditions of use of the output (ie the “editor argument”).

I reaffirm what Richard said earlier:

> I don't think it violates any specific clause of the OSD. However, the
> OSI does not say that literal conformance to the OSD is the only
> requirement for approval of a license:

as this is consistent with the long standing practice.


> I think I would counter that open-source software is not free (or ‘libre’)
> software (in FSF parlance). To quote, "Why Open Source Misses the Point of Free
> Software" [ https://www.gnu.org/philosophy/open-source-misses-the-point.html |
> https://www.gnu.org/philosophy/open-source-misses-the-point.html ] . Further, "
> the criteria for open source are concerned solely with the use of the source
> code. Indeed, almost all the items in the Open Source Definition are formulated
> as conditions on the software's source license rather than on what users are
> free to do.”

This argument has no bearing on the discussion. You are applying for OSI approval of a license. Quoting GNU on why OSI is misguided seems at least a bit out of touch. Besides, trying to put a wedge between Open Source and Software Freedom is also a far stretch. The very OSD is clearly inspired on the Debian Free Software Guidelines (!). Roots aside, OSI's mission is to foster Software Freedom through making Open Source Software successful. 

> If so, then imposing conditions on use of the output is not against the 10 OSD
> criteria (does not appear to be in violation of OSD6 and OSD8).

See the first answer.


> I would even say that an editor which required any file created with the editor
> to have an attribution notice could be open-source, but not free. Of course, in
> practice, such an attribution requirement is silly for editors, but as Moming
> pointed out, an LLM is not a code editor.

It' an analogy, and as any analogy, if you dig deep enough, you will always find logical fallacies. However, it points to a real issue. The case that the output of a query to an LLM can be controlled by the author of the LLM and why it's any different from the output of an editor, and it is necessary to ensure freedoms, still needs to be made.

The basic idea is that you must not impose unnecessary friction and enable compliance merely relying on the license and the distribution artifact, not requesting anything beyond the subject matters and derivatives, not even a thank you note, for using the software.

If you start imposing conditions on the output, are also those getting hold of the output bound to keep the attribution? If so, you are imposing a copyleft condition on something you don't control as copyright holder and it is not aimed at making the Open Source status of the artifact, but to extend control through copyright or else on something else created by a licensee. Which is a restriction on the use of software and not a condition made to preserve the Freedoms granted on the software.

If not, it's totally moot, as the attribution clause would not survive the first act of distribution, even admitting that the artifact is copyright subject matter in the first place -- which I doubt --, which would also make a copyleft condition moot. I think this matches Pam's argument, to some extent.
 

> More importantly, while the principles of ‘free’ software point in favour of
> having no conditions on use/output, the unique nature of LLMs also mean that
> some conditions on use/output are desirable. 

This is evidently a petition of principle.


> And although this is a
> value-judgment (on how AI-created content should be treated), I would say that
> (1) adherence to the 4 FSF freedoms is itself a value-judgment in favour of
> freedom (understood a certain way) in software, and (2) labelling AI-generated
> content is something that is compatible with, and necessary, for the 4 FSF
> freedoms to be desirable.

Another petition of principle.

>  I would analogise this to how even the right of free
> speech is limited in specific contexts. This point is more germane to the
> “advertisement” requirement (previously 2.4(b), which had the purpose of
> targeting misinformation and misleading claims), but also applies to the
> current Section 2.2(b) (which has the narrower purpose of ensuring
> sustainability in open-source LLM development).

Having the purpose of ensuring sustainability in Open Source (and not open-source, please) LLM development is not a good reason to impose conditions on other than the distribution artifact and derivatives. By that token, one could argue that proprietary exploitation is required to sustain Open Source development, which sort of defies the concept.

To my record, the BSD 4-clauses has not been approved, unlike the 3-or-less clauses, the difference being the advertisement clause.


> In the most recent threads, there was also some discussion about whether
> copyright or patent law grants the Licensor rights/control over the output. My
> view is that copyright/patent law does not need to grant the Licensor
> rights/control over the output, since the License is imposing conditions on the
> use of the Model (ie Licensed Materials), and is making the License grant
> conditional on compliance with these conditions of use. I would suggest not
> approaching the issue from the point of view of IP rights over output (since
> that is a thorny issue - requires further analysis), but merely as an issue of
> whether imposing certain conditions on use is OSD-compliant.

It is an established principle that the conditions must be strictly on the subject matter and not restrict the output. As Richard pointed out, the analysis of OSI must go beyond what is strictly and verbatim imposed by the OSD. As I said, OSI has consistently rejected licenses that imposed obligations beyond the distribution of the software or derivatives. 


[...]

---

On this subject, on a previous message you argued, way more to the point:

> Second, I understand that after multiple generations, attribution information may be lost. However, the intention behind the output provision is simply to remind sub-users at the first generation to respect attribution. For example, if a model owner releases their model under MG-BY, and someone (possibly even the model owner) distills outputs from it into a dataset and publishes that dataset without any attribution notice, then a downstream user who trains a new model on that dataset may unknowingly breach the MG-BY license. If their use does not result in a derivative model, it still undermines the licensor’s original intention in choosing MG-BY, in such a case, distilling and republishing as a dataset effectively bypasses the conditions set by the ModelGo License. From an ML perspective, a single generated artwork does not constitute a dataset. Some distilled datasets can be found at: https://huggingface.co/datasets?sort=trending&search=distill


Again, there is a lot of petition of principles. The "economy of LLMs" seems to transpire, but the need to control the dataset is still not made a case.

If the distilled dataset is not a direct derivative, but it builds on accrued knowledge, this is also true for all Open Source software, and even non Open Source, as learning and deriving knowledge is [f|F]ree, as long as one doesn't just copy nor uses the work of others to make something larger. 

You are supposed, conversely, to establish that by distilling the information into a dataset, this dataset is something you have a right to control insofar it is a derivative of the model, and not just output created through interaction or knowledge extracted from the model. From my understanding, the examples are curated and a work of ingenuity, deliberation and labor of those who have made it, which is exactly the intended use of a model (offer an input in order to obtain an output).

In order to ensure Freedom, we accept that conditions are imposed to limit "absolute" Freedom, either on the artifact itself, or on derivatives thereof -- and this is copyleft (which is not a necessary trait of Open Source).  But we need to strike a balance on what conditions are acceptable and which impose too much of a toll on anyone else, and this balance has been found in allowing conditions only if applied on the distributed subject matters and parts thereof, including  derivatives, i.e., works that contain and build upon other people's work.


All the best,

Carlo
Speaking in his personal capacity.


> Thanks,
> Moming

> _______________________________________________
> The opinions expressed in this email are those of the sender and not necessarily
> those of the Open Source Initiative. Communication from the Open Source
> Initiative will be sent from an opensource.org email address.

> License-review mailing list
> License-review at lists.opensource.org
> http://lists.opensource.org/mailman/listinfo/license-review_lists.opensource.org


More information about the License-review mailing list