No announcement yet.

Calculating percentage from the report

  • Filter
  • Time
  • Show
Clear All
new posts

  • Calculating percentage from the report

    I have been asked to submit testimony as an expert witness in a software piracy trial due to the fact that I have 30 years of experience in the language being used.

    I am familiar with Beyond Compare as I have used it for years to compare versions of my own source code.

    Now I am trying to use it to compare source owned by two different developers.
    There are hundreds of files that need to be compared so it just isn't practical to eyeball all this code and calculate a percentage. I have been asked to provide a percentage number for each source file (of the same name) in which the code matches exactly.

    For example here is a report:

    6259 same line(s)
    261 unimportant left orphan line(s)
    110 unimportant right orphan line(s)
    207 unimportant difference line(s)
    3782 important left orphan line(s)
    2430 important right orphan line(s)
    1248 important difference line(s)

    752 difference section(s)

    With only the above information, would it be possible that I could write an algorithm that would use this report data to give me a percentage of matching lines of code?

  • #2
    Beyond Compare doesn't provide a report that lists percentage of matching lines.

    You can obtain the total number of lines in each file by adding up the summary counts.

    Left file lines = same lines + unimportant left orphan lines + unimportant difference lines + important left orphan lines + important difference lines

    Right file lines = same lines + unimportant right orphan lines + unimportant difference lines + important right orphan lines + important difference lines.

    Unimportant difference lines are those that exist on both sides but with changes in whitespace, character case, or comments, so you can probably consider them a match.

    A percentage of original file lines that exist in a possible copied file might look like:
    (same lines + unimportant difference lines) / total left side lines

    A percentage of the potentially copied file lines that are from the original file might look like:
    (same lines + unimportant difference lines) / total right side lines

    Note that Beyond Compare doesn't detect moved lines, they're shown as adds and deletes. This means if the copied code had a function from the beginning of the original file moved to the end, it won't be reported as same lines.

    There are dedicated tools for detecting source code plagiarism, they might work better than a general purpose comparison tool like Beyond Compare. A quick Google search for "source code plagiarism detection" turned up MOSS (Measure of Source Code Similarity) as an example.
    Chris K Scooter Software


    • #3
      This is good information. Thank you.
      The expert on the other side used Beyond Compare to get a percentage but he didn't clarify how he arrived at that percentage and it just didn't look right to me.