No announcement yet.

Ignoring differences in csv files

  • Filter
  • Time
  • Show
Clear All
new posts

  • Ignoring differences in csv files

    I used to use BC2.0 and I had no problem configuring it to ignore these differences, but after a year or so, I need to start doing it again and I now have a new PC and had to order BC3.0 Pro and I cannot figure out how to make it do the same thing.
    csv files have row numbers in the first field of each row. I need to ignore row# differences between 2 csv files if one is missing some rows. I defined a new file format (since csv didn't have grammer tab) and defined a new element named csvRow# = ^.*?,
    (regular expr), then unchecked it on session settings > importance tab.
    This caused all differences to disappear between the files even though there were other valid differences.

    The two files have misc text strings that I know will not be the same, so when generating the csv files I enclosed those strings with [email protected] (@@). I then created a new grammer element:
    [email protected]@ = @@.*[email protected]@
    (regular expr), then I unchecked it on session settings > importance tab (and rechecked the csvRow# so that the differences were showing again).

    The "@@<string>@@" strings were not ignored, nor did they have a blue font.

    I am probably messing up the regular expressions - can someone help me out?

  • #2

    For the first regular expression, you have
    ^ Beginning of line
    .* anything
    ? 0 or 1 matches (not sure exactly how this would impact it)
    , end at a comma

    The reason this is having trouble is that the , character also matches the .* anything section. This grammar can match on the entire line up to the last comma. What you are looking for if just for the first section, would be something closer to this:
    Where instead of .*, you match on 0 or more "not ," characters, then end on the first "," you find.

    For your second definition, Grammar elements also match from left to right as the line is scanned. If you have a String grammar that is swallowing the " to ", then the other @@ to @@ grammar inside of it will not override it even if it is higher in the priority list. The priority list helps priority but is not absolute. If it is higher in the list and includes the quotes ("@@ to @@") it should then work.

    You can always check which text is detected as which grammar element by clicking your blinking cursor into a section of text, then checking the bottom status bar for each pane. Will will mention the currently detected grammar name or "Default text" next to the line number:column position.

    Do you still have BC2 installed on your old machine? If so, you can use the Tools menu -> Export your settings, and install BC2 on the new machine as well. You cannot directly import your settings into BC3, but you can then launch BC2 and copy/paste your Regular Expressions from BC2 into BC3 grammars. In BC2, these were defined as String (Important) or Unimportant. In BC3, you can give the grammar a name, then define it's importance in the Session Settings (per session or as a global default).

    If you have any questions or need any help, please contact us here on the forum or by emailing us at [email protected]. If you email us, please include a link to this forum post for our reference.
    Aaron P Scooter Software


    • #3
      I tried those changes to each grammer element (in Tools-> File Format-> TMMcsv (special file format that I created) with no difference in results (The tool is detecting the correct TMMcsv file format).
      First element: "@@.*[email protected]@"
      Second: ^[^,]*,

      I reopen the 2 files (show diffs, ignore unimportant diffs) and try enable each element one at a time in the Session Settings-> Importance. In both cases, no text is blue highlighted and the differences are not being ignored.

      Here is the left line difference with the first element enabled:

      6,03.DATA_REP,@@[email protected]@,DR59,,1,CURNT VERS IND =,Y,,,OWNR PROJ ID =,1000,,,,,,,,,@@[email protected]@,@@2012-09-20 13:36:[email protected]@

      right file diff line:
      6,03.DATA_REP,@@[email protected]@,DR59,,1,CURNT VERS IND =,Y,,,OWNR PROJ ID =,1000,,,,,,,,,@@[email protected]@,@@2012-09-20 13:36:[email protected]@

      in the left file, the last zero in @@[email protected]@ is highlighted in red. In the right file, the matching 8 is highlighted as the difference. I expect both rows to be ignored as different since the only diff is that grammer element. If there was another text diff between the lines I would expect the diff lines to show up with blue highlights on the @@[email protected] string and red highlights on the other string diffs.

      Am I still doing something wrong?


      • #4

        Would it be possible to send us a pair of example files by email? Our email is [email protected]

        Please also include a link back to this forum thread and your (from your Help menu -> Support; Export). This way we can verify your specific settings and files, and recreate the exact comparison you are attempting.
        Aaron P Scooter Software


        • #5

          Thanks for the files. It isn't the String vs. Other Grammar issue like I thought it might be, as the trouble line does not have quotes. I used this RegEx for that line:
          @@[^@][email protected]@

          How does this work for you?
          Aaron P Scooter Software