Discussion
Loading...

Post

  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Minoru
@minoru@functional.cafe  ·  activity timestamp 4 weeks ago

I was curious to know where's the boundary between similarity and copyright infringement, specifically in the context of using large language models for programming. https://writing.kemitchell.com/2025/01/16/Provisional-Guidance-LLM-Code is just what I needed: a high-level explanation that there is no such rule (yet), and what can be done in its absence.

Kyle's prose is rich and always takes me a while to read and digest, so if you're in a hurry, here's my takeaways i:

  1. there is no specific number of how many characters/tokens/lines one has to generate for it to become an infringement

  2. there's a continuum: autocompletion-generation-authorship. If one is auto-completing a simple line of code, it's probably fine. If one generates the same boilerplate that half the projects in the world contain, it's probably fine too, but make sure it's really boilerplate and nothing original. If one is asking for a complete implementation of some algorithm, the risks are way higher

  3. one should document everything that's done by an LLM, to be used later as evidence of noninfringement. LLM's output should be stored as separate commits, containing the prompt. Human's edits should be in a separate commit to clearly delineate what was generated and what was authored.

Of course, there's still many more questions to be answered about LLMs: potential infringements during training, efficiency of training and inference compared to typing the code yourself, as well as more philosophical questions of where this brings programming as activity.

#LargeLanguageModels #Law

  • Copy link
  • Flag this post
  • Block
Minoru
@minoru@functional.cafe replied  ·  activity timestamp 3 weeks ago

I continue reading up on legal ramifications re LLMs.

https://matthewbutterick.com/chron/will-ai-obliterate-the-rule-of-law.html takes a rather philosophical angle compared to Mitchell's post. It's a generalisation of "copyright laundering" into "behaviour laundering". Current laws are written for humans and treat machines as instruments. These instruments can't be faulted; their operators take responsibility. However, if there appears a machine that can operate so autonomously that its agency gets on the level of humans, we got a problem: the machine can do whatever, and there is no legal recourse!

Butterick then goes on to analyse possible solutions. It feels to me that the history will end up creating it's own rather than pick from the list, but I can't pinpoint what exactly it might be.

#LargeLanguageModels #Law

  • Copy link
  • Flag this comment
  • Block
Log in

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.1-alpha.8 no JS en
Automatic federation enabled
  • Explore
  • About
  • Members
  • Code of Conduct
Home
Login