Announcement

Collapse
No announcement yet.

unimportant vs. default text

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    I was referring to your smaller test files. The other approach that may work is to use the data compare instead of the text compare. It's designed to handle files where there's a primary record ID that should be used for alignment, and I just wanted to see what your data looked like so I could say whether it would work for you, and if so, how you should configure it.
    Zoë P Scooter Software

    Comment


    • #17
      Originally posted by Craig View Post
      Are the record numbers ordered and increasing?
      Yes they are, strictly mathematically speaking at least. They are negative numbers, so the absolute value is decreasing.

      Like -100, -99, -98

      However, they are not necessarily consecutive, gaps do occur. (But normally I would expect the same gaps in both files)

      The following regexp matches the record numbers

      Code:
      ^-[0-9]{10}
      Originally posted by Craig View Post
      Are the record numbers ordered and increasing?
      How could BC's algorithms benefit from such a property?

      Comment


      • #18
        OK, I see. (Your reply "passed" my previous comment). I have never looked into data compare. Have to do that.

        Comment


        • #19
          Hmm, at a first glimpse, it looks that data data compare cannot by used.

          Code:
          -0087891776 |  |    NR:0423:C00BC7AC ptrace           .linux\select\do_sys_poll+0x25C    0.110us
                      |                        fput_light(file, fput_needed);
                      |                }
                      |        }
                  644 |        pollfd->revents = mask;
                      |  | ldr     r3,[r11,#-0x38C]
                  684 |                                        count++;
                      |  | orrs    r2,r10,r3         ; r2,fdcount,r3
                  675 |                        for (; pfd != pfd_end; pfd++) {
                      |  | bne     0xC00BC7FC
          -0087891775 |  |    NR:0423:C00BC7B8 ptrace           .linux\select\do_sys_poll+0x268   <0.020us
          The 10 digit negative numbers are the record numbers I need for alignment.
          However, other text can appear in the same columns. (3 digit numbers in the example). Not sure how they could be ignored.

          Comment


          • #20
            Yeah, I'd agree; the data compare probably won't work. Have you tried using line weights yet? Since your record numbers are unique you should make a weight that matches them and give the weight a 5 priority. That will force it to align any matching record numbers it sees as soon as it finds them. That combined with a sufficiently large skew tolerance may be enough to get a proper alignment.
            Zoë P Scooter Software

            Comment


            • #21
              Originally posted by Craig View Post
              Have you tried using line weights yet?
              Yes, I have. See my posting above

              Originally posted by ugeuder View Post
              If the grammar is empty and I try to achieve the alignment using line weights, it doesn't work at all. (The same as with no line weight at all)
              I defined a line weight of 5 for
              Code:
              ^-[0-9]{10}
              but nothing happened. Alignment was like with default text format.

              Maybe I have to try it again, just in case I had a stupid typo. (I can't do it right now, because I'm not at work.)

              Comment


              • #22
                Ah, forgot to ask

                Originally posted by Craig View Post
                a sufficiently large skew tolerance may be enough to get a proper alignment.
                What's the unit of skew? Lines? Bytes?

                In the files currently discussed I'd typically have about 10 inserted lines on one side. Sometimes maybe a bit more, but I don't think that should be the problem. From my many years usage of using BC (with all kind of files) I'd say I have seen much longer insertions and it BC has been able to align correctly after them just with the default skew.

                Or is the length of the match also relevant? Of course my matches here are always only 11 characters long (the record number).

                Comment


                • #23
                  Skew tolerance is in lines. The length of the match isn't important, but the line weights are only applied if the lines match exactly (excluding unimportant text). Basically the skew tolerance is the maximum amount that it can reliably handle for inserts.
                  Zoë P Scooter Software

                  Comment

                  Working...
                  X