No announcement yet.

Coarse Comparison / Line Based Delimiters

  • Filter
  • Time
  • Show
Clear All
new posts

  • Coarse Comparison / Line Based Delimiters


    I have 2 massive (>150MB) .mbox files that I am trying to compare. For thoes not familar with mbox files they are basically emails in raw text concatenated together, see:

    I am using a text compare which seems to work to some extent, however I am having problems getting the files to line up automatically because there are many lines in common between totally different emails in the file (primarily the email header where much of the meta data matches in all cases). As such the comparison is to fine and miss-matches all over the place. Is there some way I can get beyond compare to match at a more course level?

    I may be able to use delimiters in the file, but I have no idea how to configure beyond compare to match using these line based delimiters.

    Any and all help would be much appreciated.