Discussion
Loading...

Post

Log in
  • About
  • Code of conduct
  • Privacy
  • Users
  • Instances
  • About Bonfire
Terence Eden
Terence Eden
@Edent@mastodon.social  ·  activity timestamp 11 hours ago

🆕 blog! “Removing "/Subtype /Watermark" images from a PDF using Linux”

Problem: I've received a PDF which has a large "watermark" obscuring every page.

Investigating: Opening the PDF in LibreOffice Draw allowed me to see that the watermark was a separate image floating above the others.

Manual Solution: Hit page down, select image, delete, repeat 500 times. …

👀 Read more: https://shkspr.mobi/blog/2026/01/removing-subtype-watermark-images-from-a-pdf-using-linux/
⸻
#LLM #pdf #python

  • Copy link
  • Flag this post
  • Block
Michael Horne
Michael Horne
@recantha@mastodon.social replied  ·  activity timestamp 4 hours ago

@Edent Why did it have a watermark across the pages, out of interest?

  • Copy link
  • Flag this comment
  • Block
Terence Eden
Terence Eden
@Edent@mastodon.social replied  ·  activity timestamp 4 hours ago

@recantha it was a pre-print review copy. Totally legit for them to watermark it - but made it unreadable.

  • Copy link
  • Flag this comment
  • Block
MarjorieR
MarjorieR
@marjolica@social.linux.pizza replied  ·  activity timestamp 11 hours ago

@Edent given you have vibe coded a scriopt to remove the watermark can we also assume that the 'large watermark' is there to tell everyone that the contents of the pdf were originally generated by an LLM?
If so can't have viewers seeing that can we?

  • Copy link
  • Flag this comment
  • Block
Terence Eden
Terence Eden
@Edent@mastodon.social replied  ·  activity timestamp 10 hours ago

@marjolica eh? I've no idea what you're talking about.
It was a pre-print book with a publisher's watermark.

  • Copy link
  • Flag this comment
  • Block
David Huggins-Daines
David Huggins-Daines
@dhd6@jasette.facil.services replied  ·  activity timestamp 11 hours ago

@Edent Hi! The watermarks are everything from /Artifact (this is actually the first argument to the BMC operator) to EMC. They don't have a length, but once you remove them you have to fix up the /Length on the enclosing content stream. However! Since you decompressed the stream, you can just set the / Length to 0, and any PDF viewer can figure it out.

  • Copy link
  • Flag this comment
  • Block
Terence Eden
Terence Eden
@Edent@mastodon.social replied  ·  activity timestamp 10 hours ago

@dhd6 you and @jleedev have both come up with interesting and different alternatives. Thanks!

  • Copy link
  • Flag this comment
  • Block
Don Thompson
Don Thompson
@guardeddon@mas.to replied  ·  activity timestamp 10 hours ago

@Edent @dhd6 @jleedev
I have done similar with Inkscape to remove watermarks and, worse, redaction blocks. It was some time ago & maybe only twice. Inkscape can be driven by script, Inkscape selects/operates on single page so I used pdftk to split doc out to single pages, ran the Inkscape script, and merged pages again with pdftk. Apols that I can’t dig out & share specifics.

  • Copy link
  • Flag this comment
  • Block
David Huggins-Daines
David Huggins-Daines
@dhd6@jasette.facil.services replied  ·  activity timestamp 10 hours ago

@Edent @jleedev if you want to better understand the structure of a PDF you can use my software 😀 but it's strictly read only https://dhdaines.github.io/playa/latest/

PLAYA

PLAYA ain't a LAYout Analyzer
  • Copy link
  • Flag this comment
  • Block
jleedev@mastodon.sdf.org
jleedev@mastodon.sdf.org
@jleedev@mastodon.sdf.org replied  ·  activity timestamp 11 hours ago

@Edent Replacing the content with whitespace works just as well without having to fix up the /Length.

  • Copy link
  • Flag this comment
  • Block

bonfire.cafe

A space for Bonfire maintainers and contributors to communicate

bonfire.cafe: About · Code of conduct · Privacy · Users · Instances
Bonfire social · 1.0.2-alpha.2 no JS en
Automatic federation enabled
Log in
  • Explore
  • About
  • Members
  • Code of Conduct