Often times I do crippled copy'n'paste which results in formulations like "the the text
" which is
really hard to find during proof reading. I researched the web and found a regular expression that does the
job, then I put it into a script that takes a text file as arguments, calls grep
and shows
the repeated words in color highlighting with line number. It's plenty simple and not really worth the effort of
putting it up here, but maybe it's useful for some people. It's useful to include the script into the makefile
used to generate .ps/.pdf files from the tex file:
all: latex paper bibtex paper latex paper latex paper dvips paper.dvi check_repeats paper.tex ps2pdf paper.ps
I hacked an addition that will catch word repeats over linebreaks, i.e. the same word at the end of a line and the
beginning of the following line. There should be an easier way to get this working with sed
. However,
while fiddling with this I found the awk+grep
solution which was implemented faster. Going over the
file twice might not be efficient, but I indended to make it obvious that some repeats occur over line breaks. Here's
an example that can be downloaded:
> check_repeats test.txt Checking for repeated words in a line of test.txt: 1 :bla hello hello 4 :repeated words works now now Checking for repeated words over linebreaks of test.txt: 00002 -00003: ... see see whether checking for
Download: check_repeats and test.txt
ronni <at> gi <dot> alaska <dot> edu | Last modified: October 26 2011 19:21.