Post · bonfire.cafe

Discussion

Post

About
Code of conduct
Privacy
Users
Instances
About Bonfire

@minoru@functional.cafe · activity timestamp 3 months ago

I was curious to know where's the boundary between similarity and copyright infringement, specifically in the context of using large language models for programming. https://writing.kemitchell.com/2025/01/16/Provisional-Guidance-LLM-Code is just what I needed: a high-level explanation that there is no such rule (yet), and what can be done in its absence.

Kyle's prose is rich and always takes me a while to read and digest, so if you're in a hurry, here's my takeaways i:

there is no specific number of how many characters/tokens/lines one has to generate for it to become an infringement
there's a continuum: autocompletion-generation-authorship. If one is auto-completing a simple line of code, it's probably fine. If one generates the same boilerplate that half the projects in the world contain, it's probably fine too, but make sure it's really boilerplate and nothing original. If one is asking for a complete implementation of some algorithm, the risks are way higher
one should document everything that's done by an LLM, to be used later as evidence of noninfringement. LLM's output should be stored as separate commits, containing the prompt. Human's edits should be in a separate commit to clearly delineate what was generated and what was authored.

Of course, there's still many more questions to be answered about LLMs: potential infringements during training, efficiency of training and inference compared to typing the code yourself, as well as more philosophical questions of where this brings programming as activity.

#LargeLanguageModels #Law

Copy link
Flag this post

Block

Minoru

@minoru@functional.cafe replied · activity timestamp 3 months ago

I continue reading up on legal ramifications re LLMs.

https://matthewbutterick.com/chron/will-ai-obliterate-the-rule-of-law.html takes a rather philosophical angle compared to Mitchell's post. It's a generalisation of "copyright laundering" into "behaviour laundering". Current laws are written for humans and treat machines as instruments. These instruments can't be faulted; their operators take responsibility. However, if there appears a machine that can operate so autonomously that its agency gets on the level of humans, we got a problem: the machine can do whatever, and there is no legal recourse!

Butterick then goes on to analyse possible solutions. It feels to me that the history will end up creating it's own rather than pick from the list, but I can't pinpoint what exactly it might be.

#LargeLanguageModels #Law

Copy link
Flag this comment

Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances

Bonfire social · 1.0.2-alpha.23 no JS en

Automatic federation enabled