Post · bonfire.cafe

It might have been noticed and discussed in the past but in case you don't know:

Wallabag — an open-source alternative to Pocket with a paid offering that can store copies of web pages for reading later or archival — is actively trying to escape legitimate mitigations typically put in place by website owners to restrict bot traffic.

Wallabag's crawler will happily try to pretend it is a human visitor instead of saying, you know, the truth, because it helps bypassing said mitigations.

For the same reasons, it will proudly pretend it's coming from a Google Result page.

And when confronted about this behaviour the team will act like there's nothing wrong.

If these tactics sounds familiar to you, that might be because that's usually what shady bots or AI crawlers do to extract valuable work on the web and sell it for profit. Though we know to expect nothing from big tech, Wallabag presents themselves as an ethical alternative and even say on their website:

because your privacy is important, we don’t resell your data. We live solely on your subscriptions.

But I guess only paying customers deserve privacy and control on their data. So it's probably fine to grab and resell other people's work without permission for their paid offering?

It's not the scandal of the century. On the contrary, it's quite common. If anything, you can treat this as an example of the general lack of understanding and/or care regarding consent that goes rampant in open source communities.

They will pretend they care until they realize they have to put in some effort or change their plans.

Even Google, Bing and Facebook announce themselves with a meaningful User Agent, and honor robots.txt directives :')

#wallabag