[License-review] Please review revised ModelGo licenses

Moming Duan duanmoming at gmail.com
Sun May 18 05:10:34 UTC 2025


Hi Carlo and Pam,


Thanks for your valuable insights. I will discuss them with our teammates. Here are three points I’d like to clarify.

First, I cannot agree that distilled content has the similar meaning as pre-training data. In fact, human-generated data is expected to be exhausted soon (https://epoch.ai/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data), effectively setting an upper limit on the amount of available pre-training data. However, the main reason developers perform distillation is not just data scarcity—it’s because model-generated data can significantly improve the robustness and generalization ability of new models. For example, distilled data from DeepSeek R1 has been shown to enhance reasoning performance. This kind of “generalization ability” is precisely what people aim to transfer.

Second, I understand that after multiple generations, attribution information may be lost. However, the intention behind the output provision is simply to remind sub-users at the first generation to respect attribution. For example, if a model owner releases their model under MG-BY, and someone (possibly even the model owner) distills outputs from it into a dataset and publishes that dataset without any attribution notice, then a downstream user who trains a new model on that dataset may unknowingly breach the MG-BY license. If their use does not result in a derivative model, it still undermines the licensor’s original intention in choosing MG-BY, in such a case, distilling and republishing as a dataset effectively bypasses the conditions set by the ModelGo License. From an ML perspective, a single generated artwork does not constitute a dataset. Some distilled datasets can be found at: https://huggingface.co/datasets?sort=trending&search=distill

Lastly, I still have not identified which specific clause of OSD this output provision directly violates.


Best,
Moming


> On 18 May 2025, at 5:51 AM, Pamela Chestek <pamela at chesteklegal.com> wrote:
> 
> Hi Moming,
> 
> I realize you're not trying to impose any license on the use, but you are imposing a obligation that runs with the output, which has never been acceptable in an open source license. I also realize that this is new territory, and just because it wasn't done for software doesn't mean it can't be done for model output, but it is something that needs to be thoughtfully considered.
> 
> What is the justification for it? Why is attribution to the original model something important enough that it has to be said? Is it because so much work went into training the model? The attribution for software is, I suspect, a nod to the concept of attribution of authors in copyrighted works that exists in some countries. But is that rationale appropriate for models, where there is likely no copyrightable authorship in the output?
> 
> I am most concerned about the implications for individual works. As I mentioned in my original email, the words "collection" and "dataset" suggest your intention may have been to limit the duty to downstream models, not generated works, but that is not at all clear in the license.  If I generate a single artistic work from a model under this license, do I have to provide attribution information on my Output? Caution would suggest that is the case, which I think is quite problematic.
> 
> I am troubled by your statement:
> 
> 
>> I recognize that attribution information may be lost over several generations—just as licensing information is often lost when data is crawled from the web and later used to train models. However, it would be unreasonable to respond to this challenge by altering data licenses to allow unrestricted reuse and removal of attribution simply for the sake of convenience or ease of crawling. 
> Licensing information shouldn't be lost in the licensing of software, and a great deal of effort goes into making sure that it isn't. To say that "oh, we know that you'll be out of compliance with the license at some point and we're cool with that" isn't how contracts do or should work. Most people try to abide by their legal obligations and will try to comply, so they will be heavily burdened by this requirement  because it will be impossible to figure out after only a generation or two. And you may be cool with it, but it is a way for someone less forgiving than you to opportunistically claim a breach of the license, putting users at risk of expensive lawsuits.
> 
> So this obligation puts a lot of burden on users, and I am looking for a reason why it's justified.
> 
> Pam
> 
> Pamela S. Chestek
> Chestek Legal
> PLEASE NOTE OUR NEW MAILING ADDRESS
> 4641 Post St.
> Unit 4316
> El Dorado Hills, CA 95762
> +1 919-800-8033
> pamela at chesteklegal
> www.chesteklegal.com <http://www.chesteklegal.com/>
> 
> 
> On 5/15/2025 7:14 AM, Moming Duan wrote:
>> Hi Pam,
>> 
>> 
>> ModelGo Licenses (MG-0, MG-BY, and MG-BY-OS) clearly grant the right to create Derivative Materials, including new models developed via techniques such as distillation. As stated in Section 2.2(b), attribution is not required for internal use of generated content; the obligation only applies when generated datasets are Distributed.
>> This is a lightweight, attribution-style requirement that is easy to comply with, for example, by including proper credit in the dataset README, as commonly seen on: https://huggingface.co/datasets
>> Importantly, this does not mean that the generated dataset must adopt the same license as the original model. 
>> I recognize that attribution information may be lost over several generations—just as licensing information is often lost when data is crawled from the web and later used to train models. However, it would be unreasonable to respond to this challenge by altering data licenses to allow unrestricted reuse and removal of attribution simply for the sake of convenience or ease of crawling. 
>> 
>> Even though the question of who owns generated content remains a legal issue yet to be fully resolved. But if model-generated content (at least when collected in significant quantities) didn’t contain knowledge or reasoning patterns akin to “source code,” why would there be such widespread enthusiasm for model distillation? We don’t see people transferring knowledge from books they wrote in Word into TEXT with the same motivation. A more fitting analogy is users copying code from one repository to another—that, to me, better captures what’s happening. This is my personal opinion.
>> 
>> 
>> Best,
>> Moming
>> 
>> 
>> 
>>> On 15 May 2025, at 11:38 AM, Pamela Chestek <pamela at chesteklegal.com> <mailto:pamela at chesteklegal.com> wrote:
>>> 
>>> This appears to be an attempt at making it a restriction for distillation or synthetic data generation, not, for example, an individual work ("a collection of Output as a dataset"), and I don't doubt that it's well-intended, but I agree that the limitation on Output is inconsistent with open source principles.  It also seems unworkable as the original Output is further reused downstream. How would one know if the original Output is still there several generations later?
>>> 
>>> Pam
>>> 
>>> Pamela S. Chestek
>>> Chestek Legal
>>> 4641 Post St.
>>> Unit 4316
>>> El Dorado Hills, CA 95762
>>> +1 919-800-8033
>>> pamela at chesteklegal
>>> www.chesteklegal.com <http://www.chesteklegal.com/>
>>> 
>>> 
>>> On 5/14/2025 7:09 AM, Carlo Piana wrote:
>>>> Josh,
>>>> 
>>>> sorry for long silence.
>>>> 
>>>> I think that the new version of the ModelGo license does not seem to addres=
>>>> s the concern I have expressed against it, following up on your own comment=
>>>>  on output (now in 2.bb). I think that imposing anything on the output of t=
>>>> he model is against the OSD as it is a restriction on the use of the licens=
>>>> ed subject matter.
>>>> 
>>>> So no, I am confused at how this new text should be addressing the above co=
>>>> ncern.
>>>> 
>>>> In a separate thread I have expressed perplexity on certain clauses, these =
>>>> seem to have been removed, so no issue on that end.
>>>> 
>>>> This applies to the updated versions.
>>>> 
>>>> Cheers
>>>> 
>>>> Carlo
>>>> 
>>>> 
>>>> 
>>>> ----- Messaggio originale -----
>>>>> Da: "Josh Berkus" <josh.berkus at opensource.org> <mailto:josh.berkus at opensource.org>
>>>>> A: "License submissions for OSI review" <license-review at lists.opensource.= <mailto:license-review at lists.opensource.=>
>>>> org>
>>>>> Inviato: Marted=C3=AC, 15 aprile 2025 1:46:45
>>>>> Oggetto: [License-review] Please review revised ModelGo licenses
>>>>> Carlo, Pam, Eric, Shuji,
>>>>> =20
>>>>> Moming has re-submitted revised versions of his licenses based on your
>>>>> feedback.  Please check them when you can and make sure that your
>>>>> concerns about the licenses have been addressed.
>>>>> =20
>>>>> --
>>>>> -- Josh Berkus
>>>>> OSI Board Member
>>>>> =20
>>>>> =20
>>>>> _______________________________________________
>>>>> The opinions expressed in this email are those of the sender and not nece=
>>>> ssarily
>>>>> those of the Open Source Initiative. Communication from the Open Source
>>>>> Initiative will be sent from an opensource.org email address.
>>>>> =20
>>>>> License-review mailing list
>>>>> License-review at lists.opensource.org <mailto:License-review at lists.opensource.org>
>>>>> http://lists.opensource.org/mailman/listinfo/license-review_lists.opensou=
>>>> rce.org
>>>> 
>>>> _______________________________________________
>>>> The opinions expressed in this email are those of the sender and not necessarily those of the Open Source Initiative. Communication from the Open Source Initiative will be sent from an opensource.org email address.
>>>> 
>>>> License-review mailing list
>>>> License-review at lists.opensource.org <mailto:License-review at lists.opensource.org>
>>>> http://lists.opensource.org/mailman/listinfo/license-review_lists.opensource.org
>>> _______________________________________________
>>> The opinions expressed in this email are those of the sender and not necessarily those of the Open Source Initiative. Communication from the Open Source Initiative will be sent from an opensource.org email address.
>>> 
>>> License-review mailing list
>>> License-review at lists.opensource.org <mailto:License-review at lists.opensource.org>
>>> http://lists.opensource.org/mailman/listinfo/license-review_lists.opensource.org
>> 
>> 
>> 
>> _______________________________________________
>> The opinions expressed in this email are those of the sender and not necessarily those of the Open Source Initiative. Communication from the Open Source Initiative will be sent from an opensource.org email address.
>> 
>> License-review mailing list
>> License-review at lists.opensource.org <mailto:License-review at lists.opensource.org>
>> http://lists.opensource.org/mailman/listinfo/license-review_lists.opensource.org
> _______________________________________________
> The opinions expressed in this email are those of the sender and not necessarily those of the Open Source Initiative. Communication from the Open Source Initiative will be sent from an opensource.org email address.
> 
> License-review mailing list
> License-review at lists.opensource.org
> http://lists.opensource.org/mailman/listinfo/license-review_lists.opensource.org

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensource.org/pipermail/license-review_lists.opensource.org/attachments/20250518/45729fa6/attachment-0001.htm>


More information about the License-review mailing list