
Yesterday, I crafted a RegEx that finds double words (aka “the the”). This is something that’s pretty useless if you write or edit in a relatively recent word processor — and maybe I’d better start doing an edit pass in one of those — but I share it here in case it’s useful to you.
(\b[A-z’‘’]+\b]) \1
Don’t do a replace-all with that. Until you’re confident it’s not picking up false positives … and probably not even then.
Not every RegEx implementation will allow you to reference a capture group inside the search term so be sure to intentionally put a “the the” in the text somewhere to make sure it’s working correctly for you.
If I get a moment, I’ll explain what it’s doing so you can make adjustments if you need to.