Do you have any thoughts on the new Really Simple Licensing (#RSL) standard for AI bots?
At first glance, this looks like a great initiative, Really Simple Licensing: https://rslstandard.org
It's a simple standard way to embed licensing terms in robots.txt, #RSS feeds, web pages and more, so that internet crawlers (especially #AI bots) can understand the author's intentions, and supports collective #licensing platforms.
This is a critical part of the plan I put forward in my blog post here https://dougiamas.com/how-we-could-build-a-better-future-for-creators-in-the-age-of-ai/ where such labelling should be legislated.
But before we jump in, what do you think of #RSL as a #standard?

How to use Really Simple Licensing (RSL) to block all AI crawlers
RSL is a new initiative by a group of big internet publishers that seeks to define the conditions under which AI crawlers can harvest their content. Their guide describes the various ways the content can be made available, including for free or a paid royalty but only by digging deeper into their reference material was I able to figure out how to prohibit all usage.
Your robots.txt needs to link to a XML file, like this:
License: https://your-domain.tld/rsl.xml
Then in that file you want this:
<rsl xmlns="https://rslstandard.org/rsl"> <content url="/" server="https://rslcollective.org/api"> <license> <prohibits type="usage">all</prohibits> </license> </content></rsl>
That’s it.
If you want to be more liberal you could change the <prohibits> line to
<permits type="usage">search</permits>
That will let them use the content for search, which is probably quite similar to what traditional search engines do. More details in their reference docs.
Optionally to dispel any plausible deniability you can also add a link to rsl.xml as a Link header in every HTTP response.
Link: <https://example.com/rsl.xml>; rel="license"; type="application/rsl+xml"
It’s still too early to say whether AI crawlers will respect the terms of the license any publishers specify, it’ll probably take a court case or two to sort that out.
PieFed has added RSL to it’s code just now. Instance admins who wish to disable RSL can set the ALLOW_AI_CRAWLERS environment variable to anything.
How to use Really Simple Licensing (RSL) to block all AI crawlers
RSL is a new initiative by a group of big internet publishers that seeks to define the conditions under which AI crawlers can harvest their content. Their guide describes the various ways the content can be made available, including for free or a paid royalty but only by digging deeper into their reference material was I able to figure out how to prohibit all usage.
Your robots.txt needs to link to a XML file, like this:
License: https://your-domain.tld/rsl.xml
Then in that file you want this:
<rsl xmlns="https://rslstandard.org/rsl"> <content url="/" server="https://rslcollective.org/api"> <license> <prohibits type="usage">all</prohibits> </license> </content></rsl>
That’s it.
If you want to be more liberal you could change the <prohibits> line to
<permits type="usage">search</permits>
That will let them use the content for search, which is probably quite similar to what traditional search engines do. More details in their reference docs.
Optionally to dispel any plausible deniability you can also add a link to rsl.xml as a Link header in every HTTP response.
Link: <https://example.com/rsl.xml>; rel="license"; type="application/rsl+xml"
It’s still too early to say whether AI crawlers will respect the terms of the license any publishers specify, it’ll probably take a court case or two to sort that out.
PieFed has added RSL to it’s code just now. Instance admins who wish to disable RSL can set the ALLOW_AI_CRAWLERS environment variable to anything.