src/cm/ext/diff_match_patch.py
author Simon Descarpentries <sid@sopinspace.com>
Mon, 21 Oct 2013 16:37:07 +0200
changeset 553 bf26fb47a14c
parent 250 cae2de810f77
permissions -rwxr-xr-x
To allow scrolling in Safari mobile, we set the content of text_view_comments frame in a jQuery UI layout. So the automated scrolling operations in c_sync.js must be adjustable to the right part to scroll. Also, if a comment have to be shown outside of the current viewport, we scroll the correct part to that viewport and then set the comment top Y offset to juste what it needs to avoid the "Add comment" button after scrolling operation. If not in Safari mobile, we add an offset here to avoid comment to display under the "Add comment" button.
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
250
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     1
#!/usr/bin/python2.4
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     3
"""Diff Match and Patch
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     4
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     5
Copyright 2006 Google Inc.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     6
http://code.google.com/p/google-diff-match-patch/
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     7
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     8
Licensed under the Apache License, Version 2.0 (the "License");
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     9
you may not use this file except in compliance with the License.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    10
You may obtain a copy of the License at
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    11
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    12
  http://www.apache.org/licenses/LICENSE-2.0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    13
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    14
Unless required by applicable law or agreed to in writing, software
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    15
distributed under the License is distributed on an "AS IS" BASIS,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    16
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    17
See the License for the specific language governing permissions and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    18
limitations under the License.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    19
"""
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    20
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    21
"""Functions for diff, match and patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    22
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    23
Computes the difference between two texts to create a patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    24
Applies the patch onto another text, allowing for errors.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    25
"""
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    26
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    27
__author__ = 'fraser@google.com (Neil Fraser)'
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    28
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    29
import math
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    30
import time
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    31
import urllib
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    32
import re
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    33
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    34
class diff_match_patch:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    35
  """Class containing the diff, match and patch methods.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    36
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    37
  Also contains the behaviour settings.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    38
  """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    39
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    40
  def __init__(self):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    41
    """Inits a diff_match_patch object with default settings.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    42
    Redefine these in your program to override the defaults.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    43
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    44
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    45
    # Number of seconds to map a diff before giving up (0 for infinity).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    46
    self.Diff_Timeout = 1.0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    47
    # Cost of an empty edit operation in terms of edit characters.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    48
    self.Diff_EditCost = 4
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    49
    # The size beyond which the double-ended diff activates.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    50
    # Double-ending is twice as fast, but less accurate.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    51
    self.Diff_DualThreshold = 32
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    52
    # At what point is no match declared (0.0 = perfection, 1.0 = very loose).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    53
    self.Match_Threshold = 0.5
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    54
    # How far to search for a match (0 = exact location, 1000+ = broad match).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    55
    # A match this many characters away from the expected location will add
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    56
    # 1.0 to the score (0.0 is a perfect match).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    57
    self.Match_Distance = 1000
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    58
    # When deleting a large block of text (over ~64 characters), how close does
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    59
    # the contents have to match the expected contents. (0.0 = perfection,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    60
    # 1.0 = very loose).  Note that Match_Threshold controls how closely the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    61
    # end points of a delete need to match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    62
    self.Patch_DeleteThreshold = 0.5
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    63
    # Chunk size for context length.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    64
    self.Patch_Margin = 4
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    65
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    66
    # How many bits in a number?
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    67
    # Python has no maximum, thus to disable patch splitting set to 0.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    68
    # However to avoid long patches in certain pathological cases, use 32.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    69
    # Multiple short patches (using native ints) are much faster than long ones.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    70
    self.Match_MaxBits = 32
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    71
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    72
  #  DIFF FUNCTIONS
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    73
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    74
  # The data structure representing a diff is an array of tuples:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    75
  # [(DIFF_DELETE, "Hello"), (DIFF_INSERT, "Goodbye"), (DIFF_EQUAL, " world.")]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    76
  # which means: delete "Hello", add "Goodbye" and keep " world."
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    77
  DIFF_DELETE = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    78
  DIFF_INSERT = 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    79
  DIFF_EQUAL = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    80
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    81
  def diff_main(self, text1, text2, checklines=True):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    82
    """Find the differences between two texts.  Simplifies the problem by
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    83
      stripping any common prefix or suffix off the texts before diffing.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    84
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    85
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    86
      text1: Old string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    87
      text2: New string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    88
      checklines: Optional speedup flag.  If present and false, then don't run
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    89
        a line-level diff first to identify the changed areas.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    90
        Defaults to true, which does a faster, slightly less optimal diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    91
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    92
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    93
      Array of changes.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    94
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    95
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    96
    # Check for equality (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    97
    if text1 == text2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    98
      return [(self.DIFF_EQUAL, text1)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    99
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   100
    # Trim off common prefix (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   101
    commonlength = self.diff_commonPrefix(text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   102
    commonprefix = text1[:commonlength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   103
    text1 = text1[commonlength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   104
    text2 = text2[commonlength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   105
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   106
    # Trim off common suffix (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   107
    commonlength = self.diff_commonSuffix(text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   108
    if commonlength == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   109
      commonsuffix = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   110
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   111
      commonsuffix = text1[-commonlength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   112
      text1 = text1[:-commonlength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   113
      text2 = text2[:-commonlength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   114
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   115
    # Compute the diff on the middle block
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   116
    diffs = self.diff_compute(text1, text2, checklines)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   117
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   118
    # Restore the prefix and suffix
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   119
    if commonprefix:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   120
      diffs[:0] = [(self.DIFF_EQUAL, commonprefix)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   121
    if commonsuffix:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   122
      diffs.append((self.DIFF_EQUAL, commonsuffix))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   123
    self.diff_cleanupMerge(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   124
    return diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   125
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   126
  def diff_compute(self, text1, text2, checklines):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   127
    """Find the differences between two texts.  Assumes that the texts do not
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   128
      have any common prefix or suffix.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   129
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   130
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   131
      text1: Old string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   132
      text2: New string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   133
      checklines: Speedup flag.  If false, then don't run a line-level diff
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   134
        first to identify the changed areas.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   135
        If true, then run a faster, slightly less optimal diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   136
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   137
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   138
      Array of changes.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   139
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   140
    if not text1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   141
      # Just add some text (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   142
      return [(self.DIFF_INSERT, text2)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   143
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   144
    if not text2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   145
      # Just delete some text (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   146
      return [(self.DIFF_DELETE, text1)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   147
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   148
    if len(text1) > len(text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   149
      (longtext, shorttext) = (text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   150
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   151
      (shorttext, longtext) = (text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   152
    i = longtext.find(shorttext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   153
    if i != -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   154
      # Shorter text is inside the longer text (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   155
      diffs = [(self.DIFF_INSERT, longtext[:i]), (self.DIFF_EQUAL, shorttext),
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   156
               (self.DIFF_INSERT, longtext[i + len(shorttext):])]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   157
      # Swap insertions for deletions if diff is reversed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   158
      if len(text1) > len(text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   159
        diffs[0] = (self.DIFF_DELETE, diffs[0][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   160
        diffs[2] = (self.DIFF_DELETE, diffs[2][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   161
      return diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   162
    longtext = shorttext = None  # Garbage collect.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   163
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   164
    # Check to see if the problem can be split in two.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   165
    hm = self.diff_halfMatch(text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   166
    if hm:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   167
      # A half-match was found, sort out the return data.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   168
      (text1_a, text1_b, text2_a, text2_b, mid_common) = hm
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   169
      # Send both pairs off for separate processing.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   170
      diffs_a = self.diff_main(text1_a, text2_a, checklines)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   171
      diffs_b = self.diff_main(text1_b, text2_b, checklines)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   172
      # Merge the results.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   173
      return diffs_a + [(self.DIFF_EQUAL, mid_common)] + diffs_b
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   174
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   175
    # Perform a real diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   176
    if checklines and (len(text1) < 100 or len(text2) < 100):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   177
      checklines = False  # Too trivial for the overhead.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   178
    if checklines:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   179
      # Scan the text on a line-by-line basis first.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   180
      (text1, text2, linearray) = self.diff_linesToChars(text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   181
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   182
    diffs = self.diff_map(text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   183
    if not diffs:  # No acceptable result.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   184
      diffs = [(self.DIFF_DELETE, text1), (self.DIFF_INSERT, text2)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   185
    if checklines:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   186
      # Convert the diff back to original text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   187
      self.diff_charsToLines(diffs, linearray)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   188
      # Eliminate freak matches (e.g. blank lines)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   189
      self.diff_cleanupSemantic(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   190
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   191
      # Rediff any replacement blocks, this time character-by-character.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   192
      # Add a dummy entry at the end.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   193
      diffs.append((self.DIFF_EQUAL, ''))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   194
      pointer = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   195
      count_delete = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   196
      count_insert = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   197
      text_delete = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   198
      text_insert = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   199
      while pointer < len(diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   200
        if diffs[pointer][0] == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   201
          count_insert += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   202
          text_insert += diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   203
        elif diffs[pointer][0] == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   204
          count_delete += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   205
          text_delete += diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   206
        elif diffs[pointer][0] == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   207
          # Upon reaching an equality, check for prior redundancies.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   208
          if count_delete >= 1 and count_insert >= 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   209
            # Delete the offending records and add the merged ones.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   210
            a = self.diff_main(text_delete, text_insert, False)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   211
            diffs[pointer - count_delete - count_insert : pointer] = a
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   212
            pointer = pointer - count_delete - count_insert + len(a)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   213
          count_insert = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   214
          count_delete = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   215
          text_delete = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   216
          text_insert = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   217
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   218
        pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   219
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   220
      diffs.pop()  # Remove the dummy entry at the end.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   221
    return diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   222
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   223
  def diff_linesToChars(self, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   224
    """Split two texts into an array of strings.  Reduce the texts to a string
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   225
    of hashes where each Unicode character represents one line.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   226
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   227
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   228
      text1: First string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   229
      text2: Second string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   230
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   231
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   232
      Three element tuple, containing the encoded text1, the encoded text2 and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   233
      the array of unique strings.  The zeroth element of the array of unique
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   234
      strings is intentionally blank.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   235
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   236
    lineArray = []  # e.g. lineArray[4] == "Hello\n"
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   237
    lineHash = {}   # e.g. lineHash["Hello\n"] == 4
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   238
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   239
    # "\x00" is a valid character, but various debuggers don't like it.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   240
    # So we'll insert a junk entry to avoid generating a null character.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   241
    lineArray.append('')
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   242
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   243
    def diff_linesToCharsMunge(text):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   244
      """Split a text into an array of strings.  Reduce the texts to a string
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   245
      of hashes where each Unicode character represents one line.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   246
      Modifies linearray and linehash through being a closure.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   247
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   248
      Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   249
        text: String to encode.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   250
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   251
      Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   252
        Encoded string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   253
      """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   254
      chars = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   255
      # Walk the text, pulling out a substring for each line.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   256
      # text.split('\n') would would temporarily double our memory footprint.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   257
      # Modifying text would create many large strings to garbage collect.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   258
      lineStart = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   259
      lineEnd = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   260
      while lineEnd < len(text) - 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   261
        lineEnd = text.find('\n', lineStart)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   262
        if lineEnd == -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   263
          lineEnd = len(text) - 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   264
        line = text[lineStart:lineEnd + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   265
        lineStart = lineEnd + 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   266
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   267
        if line in lineHash:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   268
          chars.append(unichr(lineHash[line]))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   269
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   270
          lineArray.append(line)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   271
          lineHash[line] = len(lineArray) - 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   272
          chars.append(unichr(len(lineArray) - 1))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   273
      return "".join(chars)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   274
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   275
    chars1 = diff_linesToCharsMunge(text1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   276
    chars2 = diff_linesToCharsMunge(text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   277
    return (chars1, chars2, lineArray)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   278
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   279
  def diff_charsToLines(self, diffs, lineArray):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   280
    """Rehydrate the text in a diff from a string of line hashes to real lines
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   281
    of text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   282
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   283
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   284
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   285
      lineArray: Array of unique strings.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   286
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   287
    for x in xrange(len(diffs)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   288
      text = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   289
      for char in diffs[x][1]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   290
        text.append(lineArray[ord(char)])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   291
      diffs[x] = (diffs[x][0], "".join(text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   292
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   293
  def diff_map(self, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   294
    """Explore the intersection points between the two texts.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   295
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   296
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   297
      text1: Old string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   298
      text2: New string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   299
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   300
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   301
      Array of diff tuples or None if no diff available.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   302
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   303
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   304
    # Unlike in most languages, Python counts time in seconds.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   305
    s_end = time.time() + self.Diff_Timeout  # Don't run for too long.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   306
    # Cache the text lengths to prevent multiple calls.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   307
    text1_length = len(text1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   308
    text2_length = len(text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   309
    max_d = text1_length + text2_length - 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   310
    doubleEnd = self.Diff_DualThreshold * 2 < max_d
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   311
    v_map1 = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   312
    v_map2 = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   313
    v1 = {}
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   314
    v2 = {}
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   315
    v1[1] = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   316
    v2[1] = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   317
    footsteps = {}
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   318
    done = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   319
    # If the total number of characters is odd, then the front path will
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   320
    # collide with the reverse path.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   321
    front = (text1_length + text2_length) % 2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   322
    for d in xrange(max_d):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   323
      # Bail out if timeout reached.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   324
      if self.Diff_Timeout > 0 and time.time() > s_end:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   325
        return None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   326
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   327
      # Walk the front path one step.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   328
      v_map1.append({})
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   329
      for k in xrange(-d, d + 1, 2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   330
        if k == -d or k != d and v1[k - 1] < v1[k + 1]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   331
          x = v1[k + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   332
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   333
          x = v1[k - 1] + 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   334
        y = x - k
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   335
        if doubleEnd:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   336
          footstep = (x, y)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   337
          if front and footstep in footsteps:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   338
            done = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   339
          if not front:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   340
            footsteps[footstep] = d
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   341
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   342
        while (not done and x < text1_length and y < text2_length and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   343
               text1[x] == text2[y]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   344
          x += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   345
          y += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   346
          if doubleEnd:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   347
            footstep = (x, y)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   348
            if front and footstep in footsteps:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   349
              done = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   350
            if not front:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   351
              footsteps[footstep] = d
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   352
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   353
        v1[k] = x
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   354
        v_map1[d][(x, y)] = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   355
        if x == text1_length and y == text2_length:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   356
          # Reached the end in single-path mode.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   357
          return self.diff_path1(v_map1, text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   358
        elif done:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   359
          # Front path ran over reverse path.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   360
          v_map2 = v_map2[:footsteps[footstep] + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   361
          a = self.diff_path1(v_map1, text1[:x], text2[:y])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   362
          b = self.diff_path2(v_map2, text1[x:], text2[y:])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   363
          return a + b
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   364
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   365
      if doubleEnd:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   366
        # Walk the reverse path one step.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   367
        v_map2.append({})
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   368
        for k in xrange(-d, d + 1, 2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   369
          if k == -d or k != d and v2[k - 1] < v2[k + 1]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   370
            x = v2[k + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   371
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   372
            x = v2[k - 1] + 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   373
          y = x - k
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   374
          footstep = (text1_length - x, text2_length - y)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   375
          if not front and footstep in footsteps:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   376
            done = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   377
          if front:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   378
            footsteps[footstep] = d
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   379
          while (not done and x < text1_length and y < text2_length and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   380
                 text1[-x - 1] == text2[-y - 1]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   381
            x += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   382
            y += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   383
            footstep = (text1_length - x, text2_length - y)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   384
            if not front and footstep in footsteps:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   385
              done = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   386
            if front:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   387
              footsteps[footstep] = d
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   388
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   389
          v2[k] = x
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   390
          v_map2[d][(x, y)] = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   391
          if done:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   392
            # Reverse path ran over front path.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   393
            v_map1 = v_map1[:footsteps[footstep] + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   394
            a = self.diff_path1(v_map1, text1[:text1_length - x],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   395
                                text2[:text2_length - y])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   396
            b = self.diff_path2(v_map2, text1[text1_length - x:],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   397
                                text2[text2_length - y:])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   398
            return a + b
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   399
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   400
    # Number of diffs equals number of characters, no commonality at all.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   401
    return None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   402
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   403
  def diff_path1(self, v_map, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   404
    """Work from the middle back to the start to determine the path.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   405
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   406
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   407
      v_map: Array of paths.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   408
      text1: Old string fragment to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   409
      text2: New string fragment to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   410
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   411
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   412
      Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   413
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   414
    path = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   415
    x = len(text1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   416
    y = len(text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   417
    last_op = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   418
    for d in xrange(len(v_map) - 2, -1, -1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   419
      while True:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   420
        if (x - 1, y) in v_map[d]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   421
          x -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   422
          if last_op == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   423
            path[0] = (self.DIFF_DELETE, text1[x] + path[0][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   424
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   425
            path[:0] = [(self.DIFF_DELETE, text1[x])]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   426
          last_op = self.DIFF_DELETE
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   427
          break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   428
        elif (x, y - 1) in v_map[d]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   429
          y -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   430
          if last_op == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   431
            path[0] = (self.DIFF_INSERT, text2[y] + path[0][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   432
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   433
            path[:0] = [(self.DIFF_INSERT, text2[y])]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   434
          last_op = self.DIFF_INSERT
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   435
          break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   436
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   437
          x -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   438
          y -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   439
          assert text1[x] == text2[y], ("No diagonal.  " +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   440
              "Can't happen. (diff_path1)")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   441
          if last_op == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   442
            path[0] = (self.DIFF_EQUAL, text1[x] + path[0][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   443
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   444
            path[:0] = [(self.DIFF_EQUAL, text1[x])]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   445
          last_op = self.DIFF_EQUAL
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   446
    return path
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   447
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   448
  def diff_path2(self, v_map, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   449
    """Work from the middle back to the end to determine the path.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   450
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   451
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   452
      v_map: Array of paths.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   453
      text1: Old string fragment to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   454
      text2: New string fragment to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   455
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   456
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   457
      Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   458
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   459
    path = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   460
    x = len(text1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   461
    y = len(text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   462
    last_op = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   463
    for d in xrange(len(v_map) - 2, -1, -1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   464
      while True:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   465
        if (x - 1, y) in v_map[d]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   466
          x -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   467
          if last_op == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   468
            path[-1] = (self.DIFF_DELETE, path[-1][1] + text1[-x - 1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   469
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   470
            path.append((self.DIFF_DELETE, text1[-x - 1]))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   471
          last_op = self.DIFF_DELETE
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   472
          break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   473
        elif (x, y - 1) in v_map[d]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   474
          y -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   475
          if last_op == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   476
            path[-1] = (self.DIFF_INSERT, path[-1][1] + text2[-y - 1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   477
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   478
            path.append((self.DIFF_INSERT, text2[-y - 1]))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   479
          last_op = self.DIFF_INSERT
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   480
          break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   481
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   482
          x -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   483
          y -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   484
          assert text1[-x - 1] == text2[-y - 1], ("No diagonal.  " +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   485
              "Can't happen. (diff_path2)")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   486
          if last_op == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   487
            path[-1] = (self.DIFF_EQUAL, path[-1][1] + text1[-x - 1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   488
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   489
            path.append((self.DIFF_EQUAL, text1[-x - 1]))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   490
          last_op = self.DIFF_EQUAL
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   491
    return path
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   492
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   493
  def diff_commonPrefix(self, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   494
    """Determine the common prefix of two strings.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   495
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   496
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   497
      text1: First string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   498
      text2: Second string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   499
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   500
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   501
      The number of characters common to the start of each string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   502
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   503
    # Quick check for common null cases.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   504
    if not text1 or not text2 or text1[0] != text2[0]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   505
      return 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   506
    # Binary search.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   507
    # Performance analysis: http://neil.fraser.name/news/2007/10/09/
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   508
    pointermin = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   509
    pointermax = min(len(text1), len(text2))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   510
    pointermid = pointermax
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   511
    pointerstart = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   512
    while pointermin < pointermid:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   513
      if text1[pointerstart:pointermid] == text2[pointerstart:pointermid]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   514
        pointermin = pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   515
        pointerstart = pointermin
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   516
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   517
        pointermax = pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   518
      pointermid = int((pointermax - pointermin) / 2 + pointermin)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   519
    return pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   520
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   521
  def diff_commonSuffix(self, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   522
    """Determine the common suffix of two strings.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   523
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   524
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   525
      text1: First string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   526
      text2: Second string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   527
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   528
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   529
      The number of characters common to the end of each string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   530
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   531
    # Quick check for common null cases.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   532
    if not text1 or not text2 or text1[-1] != text2[-1]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   533
      return 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   534
    # Binary search.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   535
    # Performance analysis: http://neil.fraser.name/news/2007/10/09/
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   536
    pointermin = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   537
    pointermax = min(len(text1), len(text2))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   538
    pointermid = pointermax
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   539
    pointerend = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   540
    while pointermin < pointermid:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   541
      if (text1[-pointermid:len(text1) - pointerend] ==
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   542
          text2[-pointermid:len(text2) - pointerend]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   543
        pointermin = pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   544
        pointerend = pointermin
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   545
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   546
        pointermax = pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   547
      pointermid = int((pointermax - pointermin) / 2 + pointermin)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   548
    return pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   549
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   550
  def diff_halfMatch(self, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   551
    """Do the two texts share a substring which is at least half the length of
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   552
    the longer text?
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   553
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   554
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   555
      text1: First string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   556
      text2: Second string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   557
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   558
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   559
      Five element Array, containing the prefix of text1, the suffix of text1,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   560
      the prefix of text2, the suffix of text2 and the common middle.  Or None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   561
      if there was no match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   562
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   563
    if len(text1) > len(text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   564
      (longtext, shorttext) = (text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   565
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   566
      (shorttext, longtext) = (text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   567
    if len(longtext) < 10 or len(shorttext) < 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   568
      return None  # Pointless.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   569
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   570
    def diff_halfMatchI(longtext, shorttext, i):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   571
      """Does a substring of shorttext exist within longtext such that the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   572
      substring is at least half the length of longtext?
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   573
      Closure, but does not reference any external variables.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   574
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   575
      Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   576
        longtext: Longer string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   577
        shorttext: Shorter string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   578
        i: Start index of quarter length substring within longtext.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   579
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   580
      Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   581
        Five element Array, containing the prefix of longtext, the suffix of
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   582
        longtext, the prefix of shorttext, the suffix of shorttext and the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   583
        common middle.  Or None if there was no match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   584
      """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   585
      seed = longtext[i:i + len(longtext) / 4]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   586
      best_common = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   587
      j = shorttext.find(seed)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   588
      while j != -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   589
        prefixLength = self.diff_commonPrefix(longtext[i:], shorttext[j:])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   590
        suffixLength = self.diff_commonSuffix(longtext[:i], shorttext[:j])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   591
        if len(best_common) < suffixLength + prefixLength:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   592
          best_common = (shorttext[j - suffixLength:j] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   593
              shorttext[j:j + prefixLength])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   594
          best_longtext_a = longtext[:i - suffixLength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   595
          best_longtext_b = longtext[i + prefixLength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   596
          best_shorttext_a = shorttext[:j - suffixLength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   597
          best_shorttext_b = shorttext[j + prefixLength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   598
        j = shorttext.find(seed, j + 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   599
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   600
      if len(best_common) >= len(longtext) / 2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   601
        return (best_longtext_a, best_longtext_b,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   602
                best_shorttext_a, best_shorttext_b, best_common)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   603
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   604
        return None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   605
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   606
    # First check if the second quarter is the seed for a half-match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   607
    hm1 = diff_halfMatchI(longtext, shorttext, (len(longtext) + 3) / 4)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   608
    # Check again based on the third quarter.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   609
    hm2 = diff_halfMatchI(longtext, shorttext, (len(longtext) + 1) / 2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   610
    if not hm1 and not hm2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   611
      return None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   612
    elif not hm2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   613
      hm = hm1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   614
    elif not hm1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   615
      hm = hm2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   616
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   617
      # Both matched.  Select the longest.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   618
      if len(hm1[4]) > len(hm2[4]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   619
        hm = hm1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   620
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   621
        hm = hm2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   622
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   623
    # A half-match was found, sort out the return data.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   624
    if len(text1) > len(text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   625
      (text1_a, text1_b, text2_a, text2_b, mid_common) = hm
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   626
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   627
      (text2_a, text2_b, text1_a, text1_b, mid_common) = hm
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   628
    return (text1_a, text1_b, text2_a, text2_b, mid_common)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   629
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   630
  def diff_cleanupSemantic(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   631
    """Reduce the number of edits by eliminating semantically trivial
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   632
    equalities.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   633
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   634
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   635
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   636
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   637
    changes = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   638
    equalities = []  # Stack of indices where equalities are found.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   639
    lastequality = None  # Always equal to equalities[-1][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   640
    pointer = 0  # Index of current position.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   641
    length_changes1 = 0  # Number of chars that changed prior to the equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   642
    length_changes2 = 0  # Number of chars that changed after the equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   643
    while pointer < len(diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   644
      if diffs[pointer][0] == self.DIFF_EQUAL:  # equality found
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   645
        equalities.append(pointer)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   646
        length_changes1 = length_changes2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   647
        length_changes2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   648
        lastequality = diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   649
      else:  # an insertion or deletion
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   650
        length_changes2 += len(diffs[pointer][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   651
        if (lastequality != None and (len(lastequality) <= length_changes1) and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   652
            (len(lastequality) <= length_changes2)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   653
          # Duplicate record
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   654
          diffs.insert(equalities[-1], (self.DIFF_DELETE, lastequality))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   655
          # Change second copy to insert.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   656
          diffs[equalities[-1] + 1] = (self.DIFF_INSERT,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   657
              diffs[equalities[-1] + 1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   658
          # Throw away the equality we just deleted.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   659
          equalities.pop()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   660
          # Throw away the previous equality (it needs to be reevaluated).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   661
          if len(equalities) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   662
            equalities.pop()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   663
          if len(equalities):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   664
            pointer = equalities[-1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   665
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   666
            pointer = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   667
          length_changes1 = 0  # Reset the counters.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   668
          length_changes2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   669
          lastequality = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   670
          changes = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   671
      pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   672
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   673
    if changes:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   674
      self.diff_cleanupMerge(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   675
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   676
    self.diff_cleanupSemanticLossless(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   677
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   678
  def diff_cleanupSemanticLossless(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   679
    """Look for single edits surrounded on both sides by equalities
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   680
    which can be shifted sideways to align the edit to a word boundary.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   681
    e.g: The c<ins>at c</ins>ame. -> The <ins>cat </ins>came.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   682
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   683
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   684
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   685
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   686
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   687
    def diff_cleanupSemanticScore(one, two):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   688
      """Given two strings, compute a score representing whether the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   689
      internal boundary falls on logical boundaries.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   690
      Scores range from 5 (best) to 0 (worst).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   691
      Closure, but does not reference any external variables.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   692
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   693
      Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   694
        one: First string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   695
        two: Second string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   696
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   697
      Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   698
        The score.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   699
      """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   700
      if not one or not two:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   701
        # Edges are the best.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   702
        return 5
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   703
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   704
      # Each port of this function behaves slightly differently due to
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   705
      # subtle differences in each language's definition of things like
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   706
      # 'whitespace'.  Since this function's purpose is largely cosmetic,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   707
      # the choice has been made to use each language's native features
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   708
      # rather than force total conformity.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   709
      score = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   710
      # One point for non-alphanumeric.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   711
      if not one[-1].isalnum() or not two[0].isalnum():
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   712
        score += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   713
        # Two points for whitespace.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   714
        if one[-1].isspace() or two[0].isspace():
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   715
          score += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   716
          # Three points for line breaks.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   717
          if (one[-1] == "\r" or one[-1] == "\n" or
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   718
              two[0] == "\r" or two[0] == "\n"):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   719
            score += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   720
            # Four points for blank lines.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   721
            if (re.search("\\n\\r?\\n$", one) or
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   722
                re.match("^\\r?\\n\\r?\\n", two)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   723
              score += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   724
      return score
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   725
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   726
    pointer = 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   727
    # Intentionally ignore the first and last element (don't need checking).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   728
    while pointer < len(diffs) - 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   729
      if (diffs[pointer - 1][0] == self.DIFF_EQUAL and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   730
          diffs[pointer + 1][0] == self.DIFF_EQUAL):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   731
        # This is a single edit surrounded by equalities.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   732
        equality1 = diffs[pointer - 1][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   733
        edit = diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   734
        equality2 = diffs[pointer + 1][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   735
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   736
        # First, shift the edit as far left as possible.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   737
        commonOffset = self.diff_commonSuffix(equality1, edit)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   738
        if commonOffset:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   739
          commonString = edit[-commonOffset:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   740
          equality1 = equality1[:-commonOffset]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   741
          edit = commonString + edit[:-commonOffset]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   742
          equality2 = commonString + equality2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   743
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   744
        # Second, step character by character right, looking for the best fit.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   745
        bestEquality1 = equality1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   746
        bestEdit = edit
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   747
        bestEquality2 = equality2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   748
        bestScore = (diff_cleanupSemanticScore(equality1, edit) +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   749
            diff_cleanupSemanticScore(edit, equality2))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   750
        while edit and equality2 and edit[0] == equality2[0]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   751
          equality1 += edit[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   752
          edit = edit[1:] + equality2[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   753
          equality2 = equality2[1:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   754
          score = (diff_cleanupSemanticScore(equality1, edit) +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   755
              diff_cleanupSemanticScore(edit, equality2))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   756
          # The >= encourages trailing rather than leading whitespace on edits.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   757
          if score >= bestScore:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   758
            bestScore = score
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   759
            bestEquality1 = equality1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   760
            bestEdit = edit
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   761
            bestEquality2 = equality2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   762
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   763
        if diffs[pointer - 1][1] != bestEquality1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   764
          # We have an improvement, save it back to the diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   765
          if bestEquality1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   766
            diffs[pointer - 1] = (diffs[pointer - 1][0], bestEquality1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   767
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   768
            del diffs[pointer - 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   769
            pointer -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   770
          diffs[pointer] = (diffs[pointer][0], bestEdit)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   771
          if bestEquality2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   772
            diffs[pointer + 1] = (diffs[pointer + 1][0], bestEquality2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   773
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   774
            del diffs[pointer + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   775
            pointer -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   776
      pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   777
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   778
  def diff_cleanupEfficiency(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   779
    """Reduce the number of edits by eliminating operationally trivial
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   780
    equalities.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   781
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   782
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   783
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   784
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   785
    changes = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   786
    equalities = []  # Stack of indices where equalities are found.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   787
    lastequality = ''  # Always equal to equalities[-1][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   788
    pointer = 0  # Index of current position.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   789
    pre_ins = False  # Is there an insertion operation before the last equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   790
    pre_del = False  # Is there a deletion operation before the last equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   791
    post_ins = False  # Is there an insertion operation after the last equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   792
    post_del = False  # Is there a deletion operation after the last equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   793
    while pointer < len(diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   794
      if diffs[pointer][0] == self.DIFF_EQUAL:  # equality found
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   795
        if (len(diffs[pointer][1]) < self.Diff_EditCost and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   796
            (post_ins or post_del)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   797
          # Candidate found.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   798
          equalities.append(pointer)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   799
          pre_ins = post_ins
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   800
          pre_del = post_del
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   801
          lastequality = diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   802
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   803
          # Not a candidate, and can never become one.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   804
          equalities = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   805
          lastequality = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   806
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   807
        post_ins = post_del = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   808
      else:  # an insertion or deletion
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   809
        if diffs[pointer][0] == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   810
          post_del = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   811
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   812
          post_ins = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   813
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   814
        # Five types to be split:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   815
        # <ins>A</ins><del>B</del>XY<ins>C</ins><del>D</del>
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   816
        # <ins>A</ins>X<ins>C</ins><del>D</del>
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   817
        # <ins>A</ins><del>B</del>X<ins>C</ins>
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   818
        # <ins>A</del>X<ins>C</ins><del>D</del>
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   819
        # <ins>A</ins><del>B</del>X<del>C</del>
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   820
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   821
        if lastequality and ((pre_ins and pre_del and post_ins and post_del) or
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   822
                             ((len(lastequality) < self.Diff_EditCost / 2) and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   823
                              (pre_ins + pre_del + post_ins + post_del) == 3)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   824
          # Duplicate record
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   825
          diffs.insert(equalities[-1], (self.DIFF_DELETE, lastequality))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   826
          # Change second copy to insert.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   827
          diffs[equalities[-1] + 1] = (self.DIFF_INSERT,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   828
              diffs[equalities[-1] + 1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   829
          equalities.pop()  # Throw away the equality we just deleted
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   830
          lastequality = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   831
          if pre_ins and pre_del:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   832
            # No changes made which could affect previous entry, keep going.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   833
            post_ins = post_del = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   834
            equalities = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   835
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   836
            if len(equalities):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   837
              equalities.pop()  # Throw away the previous equality
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   838
            if len(equalities):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   839
              pointer = equalities[-1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   840
            else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   841
              pointer = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   842
            post_ins = post_del = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   843
          changes = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   844
      pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   845
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   846
    if changes:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   847
      self.diff_cleanupMerge(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   848
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   849
  def diff_cleanupMerge(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   850
    """Reorder and merge like edit sections.  Merge equalities.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   851
    Any edit section can move as long as it doesn't cross an equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   852
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   853
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   854
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   855
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   856
    diffs.append((self.DIFF_EQUAL, ''))  # Add a dummy entry at the end.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   857
    pointer = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   858
    count_delete = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   859
    count_insert = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   860
    text_delete = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   861
    text_insert = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   862
    while pointer < len(diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   863
      if diffs[pointer][0] == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   864
        count_insert += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   865
        text_insert += diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   866
        pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   867
      elif diffs[pointer][0] == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   868
        count_delete += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   869
        text_delete += diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   870
        pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   871
      elif diffs[pointer][0] == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   872
        # Upon reaching an equality, check for prior redundancies.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   873
        if count_delete != 0 or count_insert != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   874
          if count_delete != 0 and count_insert != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   875
            # Factor out any common prefixies.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   876
            commonlength = self.diff_commonPrefix(text_insert, text_delete)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   877
            if commonlength != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   878
              x = pointer - count_delete - count_insert - 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   879
              if x >= 0 and diffs[x][0] == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   880
                diffs[x] = (diffs[x][0], diffs[x][1] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   881
                            text_insert[:commonlength])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   882
              else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   883
                diffs.insert(0, (self.DIFF_EQUAL, text_insert[:commonlength]))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   884
                pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   885
              text_insert = text_insert[commonlength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   886
              text_delete = text_delete[commonlength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   887
            # Factor out any common suffixies.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   888
            commonlength = self.diff_commonSuffix(text_insert, text_delete)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   889
            if commonlength != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   890
              diffs[pointer] = (diffs[pointer][0], text_insert[-commonlength:] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   891
                  diffs[pointer][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   892
              text_insert = text_insert[:-commonlength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   893
              text_delete = text_delete[:-commonlength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   894
          # Delete the offending records and add the merged ones.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   895
          if count_delete == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   896
            diffs[pointer - count_insert : pointer] = [
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   897
                (self.DIFF_INSERT, text_insert)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   898
          elif count_insert == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   899
            diffs[pointer - count_delete : pointer] = [
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   900
                (self.DIFF_DELETE, text_delete)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   901
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   902
            diffs[pointer - count_delete - count_insert : pointer] = [
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   903
                (self.DIFF_DELETE, text_delete),
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   904
                (self.DIFF_INSERT, text_insert)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   905
          pointer = pointer - count_delete - count_insert + 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   906
          if count_delete != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   907
            pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   908
          if count_insert != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   909
            pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   910
        elif pointer != 0 and diffs[pointer - 1][0] == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   911
          # Merge this equality with the previous one.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   912
          diffs[pointer - 1] = (diffs[pointer - 1][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   913
                                diffs[pointer - 1][1] + diffs[pointer][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   914
          del diffs[pointer]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   915
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   916
          pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   917
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   918
        count_insert = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   919
        count_delete = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   920
        text_delete = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   921
        text_insert = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   922
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   923
    if diffs[-1][1] == '':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   924
      diffs.pop()  # Remove the dummy entry at the end.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   925
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   926
    # Second pass: look for single edits surrounded on both sides by equalities
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   927
    # which can be shifted sideways to eliminate an equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   928
    # e.g: A<ins>BA</ins>C -> <ins>AB</ins>AC
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   929
    changes = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   930
    pointer = 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   931
    # Intentionally ignore the first and last element (don't need checking).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   932
    while pointer < len(diffs) - 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   933
      if (diffs[pointer - 1][0] == self.DIFF_EQUAL and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   934
          diffs[pointer + 1][0] == self.DIFF_EQUAL):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   935
        # This is a single edit surrounded by equalities.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   936
        if diffs[pointer][1].endswith(diffs[pointer - 1][1]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   937
          # Shift the edit over the previous equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   938
          diffs[pointer] = (diffs[pointer][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   939
              diffs[pointer - 1][1] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   940
              diffs[pointer][1][:-len(diffs[pointer - 1][1])])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   941
          diffs[pointer + 1] = (diffs[pointer + 1][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   942
                                diffs[pointer - 1][1] + diffs[pointer + 1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   943
          del diffs[pointer - 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   944
          changes = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   945
        elif diffs[pointer][1].startswith(diffs[pointer + 1][1]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   946
          # Shift the edit over the next equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   947
          diffs[pointer - 1] = (diffs[pointer - 1][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   948
                                diffs[pointer - 1][1] + diffs[pointer + 1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   949
          diffs[pointer] = (diffs[pointer][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   950
              diffs[pointer][1][len(diffs[pointer + 1][1]):] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   951
              diffs[pointer + 1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   952
          del diffs[pointer + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   953
          changes = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   954
      pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   955
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   956
    # If shifts were made, the diff needs reordering and another shift sweep.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   957
    if changes:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   958
      self.diff_cleanupMerge(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   959
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   960
  def diff_xIndex(self, diffs, loc):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   961
    """loc is a location in text1, compute and return the equivalent location
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   962
    in text2.  e.g. "The cat" vs "The big cat", 1->1, 5->8
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   963
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   964
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   965
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   966
      loc: Location within text1.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   967
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   968
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   969
      Location within text2.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   970
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   971
    chars1 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   972
    chars2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   973
    last_chars1 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   974
    last_chars2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   975
    for x in xrange(len(diffs)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   976
      (op, text) = diffs[x]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   977
      if op != self.DIFF_INSERT:  # Equality or deletion.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   978
        chars1 += len(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   979
      if op != self.DIFF_DELETE:  # Equality or insertion.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   980
        chars2 += len(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   981
      if chars1 > loc:  # Overshot the location.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   982
        break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   983
      last_chars1 = chars1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   984
      last_chars2 = chars2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   985
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   986
    if len(diffs) != x and diffs[x][0] == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   987
      # The location was deleted.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   988
      return last_chars2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   989
    # Add the remaining len(character).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   990
    return last_chars2 + (loc - last_chars1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   991
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   992
  def diff_prettyHtml(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   993
    """Convert a diff array into a pretty HTML report.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   994
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   995
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   996
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   997
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   998
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   999
      HTML representation.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1000
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1001
    html = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1002
    i = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1003
    for (op, data) in diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1004
      text = (data.replace("&", "&amp;").replace("<", "&lt;")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1005
                 .replace(">", "&gt;").replace("\n", "&para;<BR>"))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1006
      if op == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1007
        html.append("<INS STYLE=\"background:#E6FFE6;\" TITLE=\"i=%i\">%s</INS>"
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1008
            % (i, text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1009
      elif op == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1010
        html.append("<DEL STYLE=\"background:#FFE6E6;\" TITLE=\"i=%i\">%s</DEL>"
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1011
            % (i, text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1012
      elif op == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1013
        html.append("<SPAN TITLE=\"i=%i\">%s</SPAN>" % (i, text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1014
      if op != self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1015
        i += len(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1016
    return "".join(html)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1017
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1018
  def diff_text1(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1019
    """Compute and return the source text (all equalities and deletions).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1020
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1021
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1022
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1023
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1024
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1025
      Source text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1026
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1027
    text = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1028
    for (op, data) in diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1029
      if op != self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1030
        text.append(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1031
    return "".join(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1032
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1033
  def diff_text2(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1034
    """Compute and return the destination text (all equalities and insertions).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1035
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1036
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1037
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1038
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1039
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1040
      Destination text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1041
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1042
    text = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1043
    for (op, data) in diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1044
      if op != self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1045
        text.append(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1046
    return "".join(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1047
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1048
  def diff_levenshtein(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1049
    """Compute the Levenshtein distance; the number of inserted, deleted or
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1050
    substituted characters.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1051
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1052
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1053
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1054
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1055
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1056
      Number of changes.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1057
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1058
    levenshtein = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1059
    insertions = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1060
    deletions = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1061
    for (op, data) in diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1062
      if op == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1063
        insertions += len(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1064
      elif op == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1065
        deletions += len(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1066
      elif op == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1067
        # A deletion and an insertion is one substitution.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1068
        levenshtein += max(insertions, deletions)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1069
        insertions = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1070
        deletions = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1071
    levenshtein += max(insertions, deletions)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1072
    return levenshtein
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1073
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1074
  def diff_toDelta(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1075
    """Crush the diff into an encoded string which describes the operations
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1076
    required to transform text1 into text2.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1077
    E.g. =3\t-2\t+ing  -> Keep 3 chars, delete 2 chars, insert 'ing'.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1078
    Operations are tab-separated.  Inserted text is escaped using %xx notation.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1079
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1080
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1081
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1082
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1083
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1084
      Delta text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1085
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1086
    text = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1087
    for (op, data) in diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1088
      if op == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1089
        # High ascii will raise UnicodeDecodeError.  Use Unicode instead.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1090
        data = data.encode("utf-8")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1091
        text.append("+" + urllib.quote(data, "!~*'();/?:@&=+$,# "))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1092
      elif op == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1093
        text.append("-%d" % len(data))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1094
      elif op == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1095
        text.append("=%d" % len(data))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1096
    return "\t".join(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1097
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1098
  def diff_fromDelta(self, text1, delta):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1099
    """Given the original text1, and an encoded string which describes the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1100
    operations required to transform text1 into text2, compute the full diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1101
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1102
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1103
      text1: Source string for the diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1104
      delta: Delta text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1105
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1106
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1107
      Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1108
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1109
    Raises:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1110
      ValueError: If invalid input.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1111
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1112
    if type(delta) == unicode:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1113
      # Deltas should be composed of a subset of ascii chars, Unicode not
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1114
      # required.  If this encode raises UnicodeEncodeError, delta is invalid.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1115
      delta = delta.encode("ascii")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1116
    diffs = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1117
    pointer = 0  # Cursor in text1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1118
    tokens = delta.split("\t")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1119
    for token in tokens:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1120
      if token == "":
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1121
        # Blank tokens are ok (from a trailing \t).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1122
        continue
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1123
      # Each token begins with a one character parameter which specifies the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1124
      # operation of this token (delete, insert, equality).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1125
      param = token[1:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1126
      if token[0] == "+":
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1127
        param = urllib.unquote(param).decode("utf-8")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1128
        diffs.append((self.DIFF_INSERT, param))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1129
      elif token[0] == "-" or token[0] == "=":
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1130
        try:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1131
          n = int(param)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1132
        except ValueError:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1133
          raise ValueError, "Invalid number in diff_fromDelta: " + param
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1134
        if n < 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1135
          raise ValueError, "Negative number in diff_fromDelta: " + param
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1136
        text = text1[pointer : pointer + n]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1137
        pointer += n
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1138
        if token[0] == "=":
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1139
          diffs.append((self.DIFF_EQUAL, text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1140
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1141
          diffs.append((self.DIFF_DELETE, text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1142
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1143
        # Anything else is an error.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1144
        raise ValueError, ("Invalid diff operation in diff_fromDelta: " +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1145
            token[0])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1146
    if pointer != len(text1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1147
      raise ValueError, (
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1148
          "Delta length (%d) does not equal source text length (%d)." %
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1149
         (pointer, len(text1)))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1150
    return diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1151
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1152
  #  MATCH FUNCTIONS
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1153
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1154
  def match_main(self, text, pattern, loc):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1155
    """Locate the best instance of 'pattern' in 'text' near 'loc'.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1156
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1157
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1158
      text: The text to search.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1159
      pattern: The pattern to search for.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1160
      loc: The location to search around.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1161
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1162
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1163
      Best match index or -1.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1164
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1165
    loc = max(0, min(loc, len(text)))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1166
    if text == pattern:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1167
      # Shortcut (potentially not guaranteed by the algorithm)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1168
      return 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1169
    elif not text:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1170
      # Nothing to match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1171
      return -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1172
    elif text[loc:loc + len(pattern)] == pattern:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1173
      # Perfect match at the perfect spot!  (Includes case of null pattern)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1174
      return loc
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1175
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1176
      # Do a fuzzy compare.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1177
      match = self.match_bitap(text, pattern, loc)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1178
      return match
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1179
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1180
  def match_bitap(self, text, pattern, loc):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1181
    """Locate the best instance of 'pattern' in 'text' near 'loc' using the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1182
    Bitap algorithm.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1183
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1184
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1185
      text: The text to search.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1186
      pattern: The pattern to search for.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1187
      loc: The location to search around.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1188
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1189
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1190
      Best match index or -1.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1191
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1192
    # Python doesn't have a maxint limit, so ignore this check.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1193
    #if self.Match_MaxBits != 0 and len(pattern) > self.Match_MaxBits:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1194
    #  raise ValueError("Pattern too long for this application.")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1195
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1196
    # Initialise the alphabet.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1197
    s = self.match_alphabet(pattern)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1198
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1199
    def match_bitapScore(e, x):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1200
      """Compute and return the score for a match with e errors and x location.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1201
      Accesses loc and pattern through being a closure.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1202
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1203
      Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1204
        e: Number of errors in match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1205
        x: Location of match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1206
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1207
      Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1208
        Overall score for match (0.0 = good, 1.0 = bad).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1209
      """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1210
      accuracy = float(e) / len(pattern)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1211
      proximity = abs(loc - x)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1212
      if not self.Match_Distance:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1213
        # Dodge divide by zero error.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1214
        return proximity and 1.0 or accuracy
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1215
      return accuracy + (proximity / float(self.Match_Distance))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1216
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1217
    # Highest score beyond which we give up.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1218
    score_threshold = self.Match_Threshold
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1219
    # Is there a nearby exact match? (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1220
    best_loc = text.find(pattern, loc)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1221
    if best_loc != -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1222
      score_threshold = min(match_bitapScore(0, best_loc), score_threshold)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1223
      # What about in the other direction? (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1224
      best_loc = text.rfind(pattern, loc + len(pattern))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1225
      if best_loc != -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1226
        score_threshold = min(match_bitapScore(0, best_loc), score_threshold)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1227
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1228
    # Initialise the bit arrays.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1229
    matchmask = 1 << (len(pattern) - 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1230
    best_loc = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1231
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1232
    bin_max = len(pattern) + len(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1233
    # Empty initialization added to appease pychecker.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1234
    last_rd = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1235
    for d in xrange(len(pattern)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1236
      # Scan for the best match each iteration allows for one more error.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1237
      # Run a binary search to determine how far from 'loc' we can stray at
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1238
      # this error level.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1239
      bin_min = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1240
      bin_mid = bin_max
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1241
      while bin_min < bin_mid:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1242
        if match_bitapScore(d, loc + bin_mid) <= score_threshold:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1243
          bin_min = bin_mid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1244
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1245
          bin_max = bin_mid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1246
        bin_mid = (bin_max - bin_min) / 2 + bin_min
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1247
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1248
      # Use the result from this iteration as the maximum for the next.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1249
      bin_max = bin_mid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1250
      start = max(1, loc - bin_mid + 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1251
      finish = min(loc + bin_mid, len(text)) + len(pattern)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1252
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1253
      rd = range(finish + 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1254
      rd.append((1 << d) - 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1255
      for j in xrange(finish, start - 1, -1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1256
        if len(text) <= j - 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1257
          # Out of range.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1258
          charMatch = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1259
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1260
          charMatch = s.get(text[j - 1], 0)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1261
        if d == 0:  # First pass: exact match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1262
          rd[j] = ((rd[j + 1] << 1) | 1) & charMatch
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1263
        else:  # Subsequent passes: fuzzy match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1264
          rd[j] = ((rd[j + 1] << 1) | 1) & charMatch | (
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1265
              ((last_rd[j + 1] | last_rd[j]) << 1) | 1) | last_rd[j + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1266
        if rd[j] & matchmask:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1267
          score = match_bitapScore(d, j - 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1268
          # This match will almost certainly be better than any existing match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1269
          # But check anyway.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1270
          if score <= score_threshold:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1271
            # Told you so.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1272
            score_threshold = score
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1273
            best_loc = j - 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1274
            if best_loc > loc:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1275
              # When passing loc, don't exceed our current distance from loc.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1276
              start = max(1, 2 * loc - best_loc)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1277
            else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1278
              # Already passed loc, downhill from here on in.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1279
              break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1280
      # No hope for a (better) match at greater error levels.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1281
      if match_bitapScore(d + 1, loc) > score_threshold:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1282
        break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1283
      last_rd = rd
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1284
    return best_loc
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1285
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1286
  def match_alphabet(self, pattern):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1287
    """Initialise the alphabet for the Bitap algorithm.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1288
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1289
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1290
      pattern: The text to encode.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1291
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1292
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1293
      Hash of character locations.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1294
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1295
    s = {}
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1296
    for char in pattern:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1297
      s[char] = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1298
    for i in xrange(len(pattern)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1299
      s[pattern[i]] |= 1 << (len(pattern) - i - 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1300
    return s
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1301
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1302
  #  PATCH FUNCTIONS
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1303
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1304
  def patch_addContext(self, patch, text):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1305
    """Increase the context until it is unique,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1306
    but don't let the pattern expand beyond Match_MaxBits.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1307
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1308
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1309
      patch: The patch to grow.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1310
      text: Source text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1311
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1312
    if len(text) == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1313
      return
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1314
    pattern = text[patch.start2 : patch.start2 + patch.length1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1315
    padding = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1316
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1317
    # Look for the first and last matches of pattern in text.  If two different
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1318
    # matches are found, increase the pattern length.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1319
    while (text.find(pattern) != text.rfind(pattern) and (self.Match_MaxBits ==
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1320
        0 or len(pattern) < self.Match_MaxBits - self.Patch_Margin -
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1321
        self.Patch_Margin)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1322
      padding += self.Patch_Margin
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1323
      pattern = text[max(0, patch.start2 - padding) :
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1324
                     patch.start2 + patch.length1 + padding]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1325
    # Add one chunk for good luck.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1326
    padding += self.Patch_Margin
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1327
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1328
    # Add the prefix.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1329
    prefix = text[max(0, patch.start2 - padding) : patch.start2]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1330
    if prefix:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1331
      patch.diffs[:0] = [(self.DIFF_EQUAL, prefix)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1332
    # Add the suffix.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1333
    suffix = text[patch.start2 + patch.length1 :
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1334
                  patch.start2 + patch.length1 + padding]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1335
    if suffix:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1336
      patch.diffs.append((self.DIFF_EQUAL, suffix))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1337
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1338
    # Roll back the start points.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1339
    patch.start1 -= len(prefix)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1340
    patch.start2 -= len(prefix)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1341
    # Extend lengths.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1342
    patch.length1 += len(prefix) + len(suffix)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1343
    patch.length2 += len(prefix) + len(suffix)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1344
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1345
  def patch_make(self, a, b=None, c=None):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1346
    """Compute a list of patches to turn text1 into text2.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1347
    Use diffs if provided, otherwise compute it ourselves.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1348
    There are four ways to call this function, depending on what data is
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1349
    available to the caller:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1350
    Method 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1351
    a = text1, b = text2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1352
    Method 2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1353
    a = diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1354
    Method 3 (optimal):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1355
    a = text1, b = diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1356
    Method 4 (deprecated, use method 3):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1357
    a = text1, b = text2, c = diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1358
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1359
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1360
      a: text1 (methods 1,3,4) or Array of diff tuples for text1 to
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1361
          text2 (method 2).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1362
      b: text2 (methods 1,4) or Array of diff tuples for text1 to
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1363
          text2 (method 3) or undefined (method 2).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1364
      c: Array of diff tuples for text1 to text2 (method 4) or
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1365
          undefined (methods 1,2,3).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1366
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1367
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1368
      Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1369
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1370
    text1 = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1371
    diffs = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1372
    # Note that texts may arrive as 'str' or 'unicode'.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1373
    if isinstance(a, basestring) and isinstance(b, basestring) and c is None:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1374
      # Method 1: text1, text2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1375
      # Compute diffs from text1 and text2.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1376
      text1 = a
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1377
      diffs = self.diff_main(text1, b, True)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1378
      if len(diffs) > 2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1379
        self.diff_cleanupSemantic(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1380
        self.diff_cleanupEfficiency(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1381
    elif isinstance(a, list) and b is None and c is None:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1382
      # Method 2: diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1383
      # Compute text1 from diffs.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1384
      diffs = a
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1385
      text1 = self.diff_text1(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1386
    elif isinstance(a, basestring) and isinstance(b, list) and c is None:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1387
      # Method 3: text1, diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1388
      text1 = a
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1389
      diffs = b
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1390
    elif (isinstance(a, basestring) and isinstance(b, basestring) and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1391
          isinstance(c, list)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1392
      # Method 4: text1, text2, diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1393
      # text2 is not used.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1394
      text1 = a
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1395
      diffs = c
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1396
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1397
      raise ValueError("Unknown call format to patch_make.")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1398
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1399
    if not diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1400
      return []  # Get rid of the None case.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1401
    patches = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1402
    patch = patch_obj()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1403
    char_count1 = 0  # Number of characters into the text1 string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1404
    char_count2 = 0  # Number of characters into the text2 string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1405
    prepatch_text = text1  # Recreate the patches to determine context info.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1406
    postpatch_text = text1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1407
    for x in xrange(len(diffs)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1408
      (diff_type, diff_text) = diffs[x]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1409
      if len(patch.diffs) == 0 and diff_type != self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1410
        # A new patch starts here.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1411
        patch.start1 = char_count1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1412
        patch.start2 = char_count2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1413
      if diff_type == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1414
        # Insertion
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1415
        patch.diffs.append(diffs[x])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1416
        patch.length2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1417
        postpatch_text = (postpatch_text[:char_count2] + diff_text +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1418
                          postpatch_text[char_count2:])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1419
      elif diff_type == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1420
        # Deletion.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1421
        patch.length1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1422
        patch.diffs.append(diffs[x])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1423
        postpatch_text = (postpatch_text[:char_count2] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1424
                          postpatch_text[char_count2 + len(diff_text):])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1425
      elif (diff_type == self.DIFF_EQUAL and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1426
            len(diff_text) <= 2 * self.Patch_Margin and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1427
            len(patch.diffs) != 0 and len(diffs) != x + 1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1428
        # Small equality inside a patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1429
        patch.diffs.append(diffs[x])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1430
        patch.length1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1431
        patch.length2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1432
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1433
      if (diff_type == self.DIFF_EQUAL and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1434
          len(diff_text) >= 2 * self.Patch_Margin):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1435
        # Time for a new patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1436
        if len(patch.diffs) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1437
          self.patch_addContext(patch, prepatch_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1438
          patches.append(patch)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1439
          patch = patch_obj()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1440
          # Unlike Unidiff, our patch lists have a rolling context.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1441
          # http://code.google.com/p/google-diff-match-patch/wiki/Unidiff
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1442
          # Update prepatch text & pos to reflect the application of the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1443
          # just completed patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1444
          prepatch_text = postpatch_text
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1445
          char_count1 = char_count2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1446
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1447
      # Update the current character count.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1448
      if diff_type != self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1449
        char_count1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1450
      if diff_type != self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1451
        char_count2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1452
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1453
    # Pick up the leftover patch if not empty.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1454
    if len(patch.diffs) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1455
      self.patch_addContext(patch, prepatch_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1456
      patches.append(patch)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1457
    return patches
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1458
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1459
  def patch_deepCopy(self, patches):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1460
    """Given an array of patches, return another array that is identical.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1461
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1462
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1463
      patches: Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1464
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1465
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1466
      Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1467
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1468
    patchesCopy = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1469
    for patch in patches:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1470
      patchCopy = patch_obj()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1471
      # No need to deep copy the tuples since they are immutable.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1472
      patchCopy.diffs = patch.diffs[:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1473
      patchCopy.start1 = patch.start1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1474
      patchCopy.start2 = patch.start2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1475
      patchCopy.length1 = patch.length1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1476
      patchCopy.length2 = patch.length2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1477
      patchesCopy.append(patchCopy)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1478
    return patchesCopy
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1479
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1480
  def patch_apply(self, patches, text):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1481
    """Merge a set of patches onto the text.  Return a patched text, as well
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1482
    as a list of true/false values indicating which patches were applied.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1483
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1484
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1485
      patches: Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1486
      text: Old text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1487
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1488
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1489
      Two element Array, containing the new text and an array of boolean values.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1490
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1491
    if not patches:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1492
      return (text, [])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1493
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1494
    # Deep copy the patches so that no changes are made to originals.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1495
    patches = self.patch_deepCopy(patches)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1496
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1497
    nullPadding = self.patch_addPadding(patches)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1498
    text = nullPadding + text + nullPadding
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1499
    self.patch_splitMax(patches)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1500
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1501
    # delta keeps track of the offset between the expected and actual location
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1502
    # of the previous patch.  If there are patches expected at positions 10 and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1503
    # 20, but the first patch was found at 12, delta is 2 and the second patch
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1504
    # has an effective expected position of 22.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1505
    delta = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1506
    results = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1507
    for patch in patches:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1508
      expected_loc = patch.start2 + delta
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1509
      text1 = self.diff_text1(patch.diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1510
      end_loc = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1511
      if len(text1) > self.Match_MaxBits:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1512
        # patch_splitMax will only provide an oversized pattern in the case of
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1513
        # a monster delete.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1514
        start_loc = self.match_main(text, text1[:self.Match_MaxBits],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1515
                                    expected_loc)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1516
        if start_loc != -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1517
          end_loc = self.match_main(text, text1[-self.Match_MaxBits:],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1518
              expected_loc + len(text1) - self.Match_MaxBits)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1519
          if end_loc == -1 or start_loc >= end_loc:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1520
            # Can't find valid trailing context.  Drop this patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1521
            start_loc = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1522
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1523
        start_loc = self.match_main(text, text1, expected_loc)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1524
      if start_loc == -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1525
        # No match found.  :(
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1526
        results.append(False)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1527
        # Subtract the delta for this failed patch from subsequent patches.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1528
        delta -= patch.length2 - patch.length1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1529
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1530
        # Found a match.  :)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1531
        results.append(True)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1532
        delta = start_loc - expected_loc
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1533
        if end_loc == -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1534
          text2 = text[start_loc : start_loc + len(text1)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1535
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1536
          text2 = text[start_loc : end_loc + self.Match_MaxBits]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1537
        if text1 == text2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1538
          # Perfect match, just shove the replacement text in.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1539
          text = (text[:start_loc] + self.diff_text2(patch.diffs) +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1540
                      text[start_loc + len(text1):])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1541
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1542
          # Imperfect match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1543
          # Run a diff to get a framework of equivalent indices.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1544
          diffs = self.diff_main(text1, text2, False)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1545
          if (len(text1) > self.Match_MaxBits and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1546
              self.diff_levenshtein(diffs) / float(len(text1)) >
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1547
              self.Patch_DeleteThreshold):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1548
            # The end points match, but the content is unacceptably bad.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1549
            results[-1] = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1550
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1551
            self.diff_cleanupSemanticLossless(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1552
            index1 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1553
            for (op, data) in patch.diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1554
              if op != self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1555
                index2 = self.diff_xIndex(diffs, index1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1556
              if op == self.DIFF_INSERT:  # Insertion
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1557
                text = text[:start_loc + index2] + data + text[start_loc +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1558
                                                               index2:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1559
              elif op == self.DIFF_DELETE:  # Deletion
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1560
                text = text[:start_loc + index2] + text[start_loc +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1561
                    self.diff_xIndex(diffs, index1 + len(data)):]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1562
              if op != self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1563
                index1 += len(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1564
    # Strip the padding off.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1565
    text = text[len(nullPadding):-len(nullPadding)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1566
    return (text, results)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1567
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1568
  def patch_addPadding(self, patches):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1569
    """Add some padding on text start and end so that edges can match
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1570
    something.  Intended to be called only from within patch_apply.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1571
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1572
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1573
      patches: Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1574
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1575
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1576
      The padding string added to each side.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1577
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1578
    paddingLength = self.Patch_Margin
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1579
    nullPadding = ""
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1580
    for x in xrange(1, paddingLength + 1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1581
      nullPadding += chr(x)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1582
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1583
    # Bump all the patches forward.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1584
    for patch in patches:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1585
      patch.start1 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1586
      patch.start2 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1587
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1588
    # Add some padding on start of first diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1589
    patch = patches[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1590
    diffs = patch.diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1591
    if not diffs or diffs[0][0] != self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1592
      # Add nullPadding equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1593
      diffs.insert(0, (self.DIFF_EQUAL, nullPadding))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1594
      patch.start1 -= paddingLength  # Should be 0.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1595
      patch.start2 -= paddingLength  # Should be 0.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1596
      patch.length1 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1597
      patch.length2 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1598
    elif paddingLength > len(diffs[0][1]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1599
      # Grow first equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1600
      extraLength = paddingLength - len(diffs[0][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1601
      newText = nullPadding[len(diffs[0][1]):] + diffs[0][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1602
      diffs[0] = (diffs[0][0], newText)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1603
      patch.start1 -= extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1604
      patch.start2 -= extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1605
      patch.length1 += extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1606
      patch.length2 += extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1607
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1608
    # Add some padding on end of last diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1609
    patch = patches[-1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1610
    diffs = patch.diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1611
    if not diffs or diffs[-1][0] != self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1612
      # Add nullPadding equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1613
      diffs.append((self.DIFF_EQUAL, nullPadding))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1614
      patch.length1 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1615
      patch.length2 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1616
    elif paddingLength > len(diffs[-1][1]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1617
      # Grow last equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1618
      extraLength = paddingLength - len(diffs[-1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1619
      newText = diffs[-1][1] + nullPadding[:extraLength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1620
      diffs[-1] = (diffs[-1][0], newText)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1621
      patch.length1 += extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1622
      patch.length2 += extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1623
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1624
    return nullPadding
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1625
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1626
  def patch_splitMax(self, patches):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1627
    """Look through the patches and break up any which are longer than the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1628
    maximum limit of the match algorithm.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1629
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1630
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1631
      patches: Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1632
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1633
    if self.Match_MaxBits == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1634
      return
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1635
    for x in xrange(len(patches)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1636
      if patches[x].length1 > self.Match_MaxBits:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1637
        bigpatch = patches[x]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1638
        # Remove the big old patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1639
        del patches[x]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1640
        x -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1641
        patch_size = self.Match_MaxBits
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1642
        start1 = bigpatch.start1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1643
        start2 = bigpatch.start2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1644
        precontext = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1645
        while len(bigpatch.diffs) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1646
          # Create one of several smaller patches.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1647
          patch = patch_obj()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1648
          empty = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1649
          patch.start1 = start1 - len(precontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1650
          patch.start2 = start2 - len(precontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1651
          if precontext:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1652
            patch.length1 = patch.length2 = len(precontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1653
            patch.diffs.append((self.DIFF_EQUAL, precontext))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1654
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1655
          while (len(bigpatch.diffs) != 0 and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1656
                 patch.length1 < patch_size - self.Patch_Margin):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1657
            (diff_type, diff_text) = bigpatch.diffs[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1658
            if diff_type == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1659
              # Insertions are harmless.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1660
              patch.length2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1661
              start2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1662
              patch.diffs.append(bigpatch.diffs.pop(0))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1663
              empty = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1664
            elif (diff_type == self.DIFF_DELETE and len(patch.diffs) == 1 and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1665
                patch.diffs[0][0] == self.DIFF_EQUAL and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1666
                len(diff_text) > 2 * patch_size):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1667
              # This is a large deletion.  Let it pass in one chunk.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1668
              patch.length1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1669
              start1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1670
              empty = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1671
              patch.diffs.append((diff_type, diff_text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1672
              del bigpatch.diffs[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1673
            else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1674
              # Deletion or equality.  Only take as much as we can stomach.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1675
              diff_text = diff_text[:patch_size - patch.length1 -
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1676
                                    self.Patch_Margin]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1677
              patch.length1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1678
              start1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1679
              if diff_type == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1680
                patch.length2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1681
                start2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1682
              else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1683
                empty = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1684
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1685
              patch.diffs.append((diff_type, diff_text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1686
              if diff_text == bigpatch.diffs[0][1]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1687
                del bigpatch.diffs[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1688
              else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1689
                bigpatch.diffs[0] = (bigpatch.diffs[0][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1690
                                     bigpatch.diffs[0][1][len(diff_text):])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1691
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1692
          # Compute the head context for the next patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1693
          precontext = self.diff_text2(patch.diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1694
          precontext = precontext[-self.Patch_Margin:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1695
          # Append the end context for this patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1696
          postcontext = self.diff_text1(bigpatch.diffs)[:self.Patch_Margin]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1697
          if postcontext:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1698
            patch.length1 += len(postcontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1699
            patch.length2 += len(postcontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1700
            if len(patch.diffs) != 0 and patch.diffs[-1][0] == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1701
              patch.diffs[-1] = (self.DIFF_EQUAL, patch.diffs[-1][1] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1702
                                 postcontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1703
            else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1704
              patch.diffs.append((self.DIFF_EQUAL, postcontext))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1705
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1706
          if not empty:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1707
            x += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1708
            patches.insert(x, patch)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1709
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1710
  def patch_toText(self, patches):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1711
    """Take a list of patches and return a textual representation.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1712
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1713
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1714
      patches: Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1715
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1716
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1717
      Text representation of patches.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1718
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1719
    text = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1720
    for patch in patches:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1721
      text.append(str(patch))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1722
    return "".join(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1723
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1724
  def patch_fromText(self, textline):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1725
    """Parse a textual representation of patches and return a list of patch
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1726
    objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1727
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1728
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1729
      textline: Text representation of patches.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1730
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1731
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1732
      Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1733
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1734
    Raises:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1735
      ValueError: If invalid input.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1736
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1737
    if type(textline) == unicode:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1738
      # Patches should be composed of a subset of ascii chars, Unicode not
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1739
      # required.  If this encode raises UnicodeEncodeError, patch is invalid.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1740
      textline = textline.encode("ascii")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1741
    patches = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1742
    if not textline:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1743
      return patches
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1744
    text = textline.split('\n')
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1745
    while len(text) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1746
      m = re.match("^@@ -(\d+),?(\d*) \+(\d+),?(\d*) @@$", text[0])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1747
      if not m:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1748
        raise ValueError, "Invalid patch string: " + text[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1749
      patch = patch_obj()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1750
      patches.append(patch)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1751
      patch.start1 = int(m.group(1))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1752
      if m.group(2) == '':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1753
        patch.start1 -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1754
        patch.length1 = 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1755
      elif m.group(2) == '0':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1756
        patch.length1 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1757
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1758
        patch.start1 -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1759
        patch.length1 = int(m.group(2))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1760
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1761
      patch.start2 = int(m.group(3))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1762
      if m.group(4) == '':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1763
        patch.start2 -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1764
        patch.length2 = 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1765
      elif m.group(4) == '0':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1766
        patch.length2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1767
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1768
        patch.start2 -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1769
        patch.length2 = int(m.group(4))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1770
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1771
      del text[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1772
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1773
      while len(text) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1774
        if text[0]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1775
          sign = text[0][0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1776
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1777
          sign = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1778
        line = urllib.unquote(text[0][1:])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1779
        line = line.decode("utf-8")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1780
        if sign == '+':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1781
          # Insertion.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1782
          patch.diffs.append((self.DIFF_INSERT, line))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1783
        elif sign == '-':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1784
          # Deletion.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1785
          patch.diffs.append((self.DIFF_DELETE, line))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1786
        elif sign == ' ':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1787
          # Minor equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1788
          patch.diffs.append((self.DIFF_EQUAL, line))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1789
        elif sign == '@':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1790
          # Start of next patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1791
          break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1792
        elif sign == '':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1793
          # Blank line?  Whatever.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1794
          pass
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1795
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1796
          # WTF?
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1797
          raise ValueError, "Invalid patch mode: '%s'\n%s" % (sign, line)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1798
        del text[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1799
    return patches
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1800
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1801
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1802
class patch_obj:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1803
  """Class representing one patch operation.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1804
  """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1805
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1806
  def __init__(self):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1807
    """Initializes with an empty list of diffs.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1808
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1809
    self.diffs = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1810
    self.start1 = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1811
    self.start2 = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1812
    self.length1 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1813
    self.length2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1814
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1815
  def __str__(self):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1816
    """Emmulate GNU diff's format.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1817
    Header: @@ -382,8 +481,9 @@
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1818
    Indicies are printed as 1-based, not 0-based.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1819
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1820
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1821
      The GNU diff string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1822
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1823
    if self.length1 == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1824
      coords1 = str(self.start1) + ",0"
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1825
    elif self.length1 == 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1826
      coords1 = str(self.start1 + 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1827
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1828
      coords1 = str(self.start1 + 1) + "," + str(self.length1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1829
    if self.length2 == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1830
      coords2 = str(self.start2) + ",0"
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1831
    elif self.length2 == 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1832
      coords2 = str(self.start2 + 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1833
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1834
      coords2 = str(self.start2 + 1) + "," + str(self.length2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1835
    text = ["@@ -", coords1, " +", coords2, " @@\n"]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1836
    # Escape the body of the patch with %xx notation.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1837
    for (op, data) in self.diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1838
      if op == diff_match_patch.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1839
        text.append("+")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1840
      elif op == diff_match_patch.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1841
        text.append("-")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1842
      elif op == diff_match_patch.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1843
        text.append(" ")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1844
      # High ascii will raise UnicodeDecodeError.  Use Unicode instead.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1845
      data = data.encode("utf-8")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1846
      text.append(urllib.quote(data, "!~*'();/?:@&=+$,# ") + "\n")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1847
    return "".join(text)