No announcement yet.

Folder+File compare - ignore "specials"

  • Time
  • Show
Clear All
new posts

  • Folder+File compare - ignore "specials"

    spcl1-CHMdecompile <!--\d+-->

    I've seen some other questions that seem similar but can't quite wrap my head around how to do this.
    Sorry for long posting below but I was trying to keep notes as I tried various things.

    Summary of what I think I found:
    (1) MAJOR: Its impossible to have "rules" specified for individual file comparisons be used at the folder (parent)level.
    (2) MINOR bug - changing order of Grammar elements as specified in [Edit grammar...] does not change order of these elements in the Text Compare - Session Settings - Importance - Grammar elements list.
    (3) CONFUSING: Text Compare - Session Settings - Replacement does nothing ... or at least nothing that I can identify


    I want to define a rule used for all text files (regardless of extension) that will cause differences matching the rule to be ignored when deciding whether the two versions of the file are identical.

    To make sure we are talking about the same thing.
    I have two versions of a CHM help file and want to identify the differences between them. I found a program (with a 30-day test version) that will decompile a CHM file. into multiple files in a folder

    Above program inserts in each "text" file the two line(s):
    (1) font-size: 11px; text-decoration: none;">The CHM file was converted to HTM by Trial version of <b>ChmD<!--62-->ecompiler</b>.</a>
    (2) font-size: 11px; text-decoration: none;">Download <b>ChmDec<!--62-->ompiler</b> at:</a>

    where "62" is a one to three digit number which varies for each file the decompiler creates in each run.

    (1) MAJOR:
    as a result ... when I use BC3 to compare the two folders, select all and ask for [Compare contents] (=?) all files show differences not just the ones that have differences I consider significant :-(

    Picking one of the HTML files for testing purposes clicking the [Referee] says its HTML and allows me to define Grammar items for HTML. So I define:
    spcl1-CHMdecompile=Text matching <!--\d+-->
    Match character case=OFF
    Regular expression=ON
    This element is case insensitive

    This adds spcl1-CHMdecompile to the HTML list of Grammar elements. If I uncheck all offered Grammar elements (Keyword, String, Comment, Operator and my special one: spcl1-CHMdecompile then the two versions DO compare as equal ... BUT ... there seems to be no way to get these "rules" honored in the Folder compare. When I click back to the Folder level each file that I have manually checked now shows the "squiqqly equal" icon [Ignore unimportant differences]

    ie. I've set [Folder Compare - Session Settings] - [v] Compare contents - (o) Rules-based comparison.

    Note: AFAICT the example lines I showed above should match both as HTML contents and under my "special rule" CORRECT? however if I click ANY of the elements Keyword, String, Comment, Operator and spcl1-CHMdecompile as being IMPORTANT then the two lines above get triggered as differences.

    (2) MINOR BUG? - In Grammar elements for HTML I moved my special rule above the Comments rule however the in the Text Compare - Importance list the order of the Grammar elements remains unchanged.

    **LATER** It appears that the actual rule that is being (IMO) incorrectly triggered is the Strings rule. If it is set to "Important" then the added stuff gets flagged as different.
    I think its because the actual text in which this appears is (for example):
    <a href="" target="_blank" style="font-family: Tahoma, Verdana;
    font-size: 11px; text-decoration: none;">Download <b>ChmDec<!--154-->ompiler</b> at:</a>
    Note that the first line above has an initial quote just before >>font-family<< and its closing quote just after >>none;<< on the second line. Am I correct that your parsing engine only looks a single lines so that it thinks that:
    >>||">Download <b>ChmDec<!--154-->ompiler</b> at:</a>||<<
    is a "string" albeit with an unmatched leading quote.?

    ie. for this particular difference to get recognized as either an HTML comment or my special grammar rule I would have to make strings unimportant thus making it impossible to recognize *real* differences in strings in two versions of an HTML file?

    (3) CONFUSING: In a last attempt to get this to work I made a copy of the [HTML] file format asd [HTML-test] moved it above [HTML] in the list so it got recognized then removed my special rule and instead tried to do the same think in the Text Compare - Session Settings - Replacement tab.
    Help for this says:
    >Replacements identify repetitive changes
    >that should be considered unimportant. You
    >can specify the text to match on one side
    >and the text that replaces it on the other

    I tried LS=<!--\d+--> RS=NULL
    and then added LS=NULL RS=<!--\d+-->
    I can't see any effect in the File compare when using either one or both the above replacements. What I was hoping to cause is that the occurence of *ANY* string matching regular expression <!--\d+--> in either left-side file or right-side file would be replaced by string NULL before comparison is done, hence making both sides compare as equal.

  • #2

    1) You will need to toggle on Ignore Unimportant Differences in the toolbar to treat Unimportant/Squiggly differences as 'Equal', and ignore them for the sake of comparison. Turning on the Rules-based comparison will use the default session settings for each file (as if you double clicked on them). If you need to alter the settings for this specific session, click into a file, alter the Session Settings, but instead of clicking Ok, use the lower left drop down to Apply to these Files, Apply to All Files, or Also Update Session Defaults. Apply to Files options are only enabled if the current session (text compare) is a child of a parent session (folder compare). Re-save the Folder Compare to keep these settings.

    2) This can be the case; however, you can alter the string definition to not End at End of Line, which would then properly continue on until closed. The issue you are running into here is that a String of equivalent text is not equal to another grammar. So:
    "This is not equal" <> This is not equal
    Is considered a difference. The text is equal, but one is a string, and the other is Everything Else (or another grammar). If strings were not defined with " characters, then only the " characters would be a difference there.

    3) Replacements must match on specific text, and cannot match on nothing. They would be more useful for you if you were attempting to match <--code--> to //code//, where code is a numeric that is the same on both sides. This would then consider those to be equal, and would not consider either unimportant if it aligned to something else. Basically, the first/Left expression can use regular expressions as you would expect, but the right side can only use back references ($1) to point to parts of the first expression.

    For your case, you probably want to define Unimportant text, set that as the default session setting, add the grammar element to each file format that needs it, and giving it the same name so that it is Unimportant across all of them.

    It sounds like you have figured out most of the steps necessary to define unimportant text, but here is a handy KB article that may help:

    After defined in each file format, edit the default session settings from the Home Screen -> Edit Session Defaults folder; or from the Session Settings dialog "Also update session defaults" option.

    Let us know if you are still running into any trouble. Feel free to send us example files, your Support Package (Help menu -> Support; Export), and screenshots to [email protected]
    Aaron P Scooter Software