No announcement yet.

3000 Page .dat File in Text comparison help

  • Time
  • Show
Clear All
new posts

  • 3000 Page .dat File in Text comparison help


    I've been playing around with Beyond Compare 2 and 4 and reading a fair amount of threads for a couple days now. I'm mainly working with BC2 now because I wanted to see if the data viewer could help me. Seems like really cool software. I've tried a lot of RegEx and played around with the delimiters but I'm pretty sure it's not helping with what I want to do. Basically, there's this website that pumps out a .dat file everyday with a list of names, emails, and phone numbers. I want to be able to see the changes in this file day by day. While I do see some changes, I run into a lot of other problems. Here's the link to the site: I am trying to compare the fpers.dat file that is generated everyday. I want to compare yesterday's with today's.

    Here's the direct link download the fper.dat file here:
    *The information is public record.

    I've also attached some JPEGs corresponding to the below problems.

    1. The header. Every page has a header with mostly the same text. Except for the date and the page number. I've tried [09]-[09], ^.*Date: \d\d/\d\d/\d\d.*Page:.*$, *Page: $. Not every page number is ignored. And the date is definitely not ignored.

    1 cont. and 2. I've used RegEx to ignore the repeating text in the that header that repeats using the delimiter and it kinda works. But half way down it's no longer ignored.

    3. Also, I want definite changes, but it shows me when text has just moved up or down a few lines. I really just want to see the change in the text itself (the content, not it's position in the doc). I don't care if it just moved to another line. And these .dat files seem to be about 3000 pages so I really need to filter out as much as possible.

    4. Also, I'm just trying to look at the lines containing people withing the LEG agency (Column 8). Not everyone in the files.

    Basically, I just want to see the day by day changes of this Government Employe information (under the LEG) agency/category and know the best way to do it.

    Any help you can offer would be greatly appreciated. I'd be happily to provide any more information you might need.

    Thank you
    Attached Files

  • #2

    I would recommend BC4's Table Compare session type (and evolution of the BC2 Data Viewer). This will sort your files into rows/columns, and you can mark a column as the Key by right-clicking the column header and selecting Key, Std, or Unimportant. The default Key is column 1, but you can set any other column or combination of columns as a Key. The Key should be unique, such as an EmployeeID or combination of FirstName and LastName columns. Would Column 2 be your unique ID? Remember to set Col 1 back as Standard, so the Key isn't Col1 + Col2.

    One hurdle is that your files contain multiple rows of header information, which would be sorted into the rest of the data (our Table Compare only anticipates a single Header row). The other is that you have many blank lines, which are sorted to the top. Could your file be processed to remove the blank lines or extra header lines?
    Aaron P Scooter Software


    • #3
      Hi Aaron,

      Thank you for your response. I opened the .dat files in MS Word 2003 and, using the word replace feature, was able to delete all of the repetitive headers. I was not able to delete the space tho. The only way was to go through the 3k page file and manually delete. But deleting the space did not seem to matter. I saved the new .dat files and imported them into BC4 Table View. You were very helpful and I was able to do almost everything I wanted. The BC4 Table View is great.

      I just have two more questions, if you wouldn't mind. Hopefully they will be easy to answer.

      How can I select multiple columns as key at one time? These are big files and it takes forever reloading them every time I select a new column as key.

      How do I search for specific text in BC4?

      EDIT: Wording


      • #4

        No problem. No, we don't currently support selecting/editing multiple columns at once, including setting the Key. You can right click any header in the main interface, or use the Session Settings dialog -> Columns tab to edit each column one at a time.

        And you are trying to Find specific text while in an open Table Compare? The Search menu -> Find (Ctrl+F) will search the Left/Right/Both sides for specific text.
        Aaron P Scooter Software