We were comparing diff-viewer tools at work today

June 3, 2014

Say you had a situation where someone changed code and also used line breaks to reformat the code. Most diff-viewer tools won't handle that situation very well. They indicate important changes where it is merely a formatting change - an unimportant change, for the most part.

What we want to is for our diff viewer to ignore the carriage returns as just more white space.

A good way to handle this follows: We know that "C like" languages are often allowed to ignore whitespace (using symbols like parenthesis, semicolon, and curly brackets to demarcate code). We can simply tokenize the input, put each token on its own line, and then diff that.

Let's see these files.

test1.c

int main(int argc, char** argv) {
  if (a && b && c && d) {
    printf("Hello diff!\n");
  }
}

test2.c

int main(int argc, char** argv) {
  if (a 
      && b 
      && c 
      && d) {
    printf("Hello diff!\n");
  }
}

diff before and after using our tokenizing trick

$ diff -q  test1.c test2.c
Files test1.c and test2.c differ

$ diff -q  \
> <(tr -s '[[:space:]]' '\n' < test2.c) \
> <(tr -s '[[:space:]]' '\n' < test1.c)
$ 

So diff is telling us they don't differ in the second case.

Contact me at byronka (at) msn.com