src/cm/ext/diff_match_patch.py
author Simon Descarpentries <sid@sopinspace.com>
Tue, 06 May 2014 13:52:01 +0200
changeset 651 9bbc657f6837
parent 250 cae2de810f77
permissions -rwxr-xr-x
Replace DISABLE_TRACKING and TRACKING_HTML by a TRACKING_ID variable in configuration files
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
250
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     1
#!/usr/bin/python2.4
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     3
"""Diff Match and Patch
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     4
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     5
Copyright 2006 Google Inc.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     6
http://code.google.com/p/google-diff-match-patch/
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     7
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     8
Licensed under the Apache License, Version 2.0 (the "License");
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
     9
you may not use this file except in compliance with the License.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    10
You may obtain a copy of the License at
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    11
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    12
  http://www.apache.org/licenses/LICENSE-2.0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    13
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    14
Unless required by applicable law or agreed to in writing, software
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    15
distributed under the License is distributed on an "AS IS" BASIS,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    16
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    17
See the License for the specific language governing permissions and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    18
limitations under the License.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    19
"""
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    20
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    21
"""Functions for diff, match and patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    22
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    23
Computes the difference between two texts to create a patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    24
Applies the patch onto another text, allowing for errors.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    25
"""
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    26
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    27
__author__ = 'fraser@google.com (Neil Fraser)'
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    28
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    29
import math
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    30
import time
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    31
import urllib
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    32
import re
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    33
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    34
class diff_match_patch:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    35
  """Class containing the diff, match and patch methods.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    36
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    37
  Also contains the behaviour settings.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    38
  """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    39
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    40
  def __init__(self):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    41
    """Inits a diff_match_patch object with default settings.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    42
    Redefine these in your program to override the defaults.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    43
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    44
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    45
    # Number of seconds to map a diff before giving up (0 for infinity).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    46
    self.Diff_Timeout = 1.0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    47
    # Cost of an empty edit operation in terms of edit characters.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    48
    self.Diff_EditCost = 4
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    49
    # The size beyond which the double-ended diff activates.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    50
    # Double-ending is twice as fast, but less accurate.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    51
    self.Diff_DualThreshold = 32
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    52
    # At what point is no match declared (0.0 = perfection, 1.0 = very loose).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    53
    self.Match_Threshold = 0.5
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    54
    # How far to search for a match (0 = exact location, 1000+ = broad match).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    55
    # A match this many characters away from the expected location will add
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    56
    # 1.0 to the score (0.0 is a perfect match).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    57
    self.Match_Distance = 1000
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    58
    # When deleting a large block of text (over ~64 characters), how close does
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    59
    # the contents have to match the expected contents. (0.0 = perfection,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    60
    # 1.0 = very loose).  Note that Match_Threshold controls how closely the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    61
    # end points of a delete need to match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    62
    self.Patch_DeleteThreshold = 0.5
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    63
    # Chunk size for context length.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    64
    self.Patch_Margin = 4
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    65
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    66
    # How many bits in a number?
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    67
    # Python has no maximum, thus to disable patch splitting set to 0.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    68
    # However to avoid long patches in certain pathological cases, use 32.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    69
    # Multiple short patches (using native ints) are much faster than long ones.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    70
    self.Match_MaxBits = 32
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    71
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    72
  #  DIFF FUNCTIONS
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    73
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    74
  # The data structure representing a diff is an array of tuples:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    75
  # [(DIFF_DELETE, "Hello"), (DIFF_INSERT, "Goodbye"), (DIFF_EQUAL, " world.")]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    76
  # which means: delete "Hello", add "Goodbye" and keep " world."
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    77
  DIFF_DELETE = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    78
  DIFF_INSERT = 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    79
  DIFF_EQUAL = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    80
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    81
  def diff_main(self, text1, text2, checklines=True):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    82
    """Find the differences between two texts.  Simplifies the problem by
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    83
      stripping any common prefix or suffix off the texts before diffing.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    84
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    85
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    86
      text1: Old string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    87
      text2: New string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    88
      checklines: Optional speedup flag.  If present and false, then don't run
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    89
        a line-level diff first to identify the changed areas.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    90
        Defaults to true, which does a faster, slightly less optimal diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    91
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    92
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    93
      Array of changes.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    94
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    95
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    96
    # Check for equality (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    97
    if text1 == text2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    98
      return [(self.DIFF_EQUAL, text1)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
    99
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   100
    # Trim off common prefix (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   101
    commonlength = self.diff_commonPrefix(text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   102
    commonprefix = text1[:commonlength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   103
    text1 = text1[commonlength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   104
    text2 = text2[commonlength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   105
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   106
    # Trim off common suffix (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   107
    commonlength = self.diff_commonSuffix(text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   108
    if commonlength == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   109
      commonsuffix = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   110
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   111
      commonsuffix = text1[-commonlength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   112
      text1 = text1[:-commonlength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   113
      text2 = text2[:-commonlength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   114
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   115
    # Compute the diff on the middle block
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   116
    diffs = self.diff_compute(text1, text2, checklines)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   117
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   118
    # Restore the prefix and suffix
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   119
    if commonprefix:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   120
      diffs[:0] = [(self.DIFF_EQUAL, commonprefix)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   121
    if commonsuffix:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   122
      diffs.append((self.DIFF_EQUAL, commonsuffix))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   123
    self.diff_cleanupMerge(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   124
    return diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   125
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   126
  def diff_compute(self, text1, text2, checklines):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   127
    """Find the differences between two texts.  Assumes that the texts do not
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   128
      have any common prefix or suffix.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   129
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   130
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   131
      text1: Old string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   132
      text2: New string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   133
      checklines: Speedup flag.  If false, then don't run a line-level diff
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   134
        first to identify the changed areas.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   135
        If true, then run a faster, slightly less optimal diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   136
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   137
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   138
      Array of changes.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   139
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   140
    if not text1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   141
      # Just add some text (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   142
      return [(self.DIFF_INSERT, text2)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   143
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   144
    if not text2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   145
      # Just delete some text (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   146
      return [(self.DIFF_DELETE, text1)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   147
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   148
    if len(text1) > len(text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   149
      (longtext, shorttext) = (text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   150
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   151
      (shorttext, longtext) = (text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   152
    i = longtext.find(shorttext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   153
    if i != -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   154
      # Shorter text is inside the longer text (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   155
      diffs = [(self.DIFF_INSERT, longtext[:i]), (self.DIFF_EQUAL, shorttext),
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   156
               (self.DIFF_INSERT, longtext[i + len(shorttext):])]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   157
      # Swap insertions for deletions if diff is reversed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   158
      if len(text1) > len(text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   159
        diffs[0] = (self.DIFF_DELETE, diffs[0][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   160
        diffs[2] = (self.DIFF_DELETE, diffs[2][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   161
      return diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   162
    longtext = shorttext = None  # Garbage collect.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   163
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   164
    # Check to see if the problem can be split in two.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   165
    hm = self.diff_halfMatch(text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   166
    if hm:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   167
      # A half-match was found, sort out the return data.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   168
      (text1_a, text1_b, text2_a, text2_b, mid_common) = hm
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   169
      # Send both pairs off for separate processing.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   170
      diffs_a = self.diff_main(text1_a, text2_a, checklines)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   171
      diffs_b = self.diff_main(text1_b, text2_b, checklines)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   172
      # Merge the results.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   173
      return diffs_a + [(self.DIFF_EQUAL, mid_common)] + diffs_b
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   174
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   175
    # Perform a real diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   176
    if checklines and (len(text1) < 100 or len(text2) < 100):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   177
      checklines = False  # Too trivial for the overhead.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   178
    if checklines:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   179
      # Scan the text on a line-by-line basis first.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   180
      (text1, text2, linearray) = self.diff_linesToChars(text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   181
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   182
    diffs = self.diff_map(text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   183
    if not diffs:  # No acceptable result.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   184
      diffs = [(self.DIFF_DELETE, text1), (self.DIFF_INSERT, text2)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   185
    if checklines:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   186
      # Convert the diff back to original text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   187
      self.diff_charsToLines(diffs, linearray)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   188
      # Eliminate freak matches (e.g. blank lines)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   189
      self.diff_cleanupSemantic(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   190
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   191
      # Rediff any replacement blocks, this time character-by-character.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   192
      # Add a dummy entry at the end.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   193
      diffs.append((self.DIFF_EQUAL, ''))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   194
      pointer = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   195
      count_delete = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   196
      count_insert = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   197
      text_delete = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   198
      text_insert = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   199
      while pointer < len(diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   200
        if diffs[pointer][0] == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   201
          count_insert += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   202
          text_insert += diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   203
        elif diffs[pointer][0] == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   204
          count_delete += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   205
          text_delete += diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   206
        elif diffs[pointer][0] == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   207
          # Upon reaching an equality, check for prior redundancies.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   208
          if count_delete >= 1 and count_insert >= 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   209
            # Delete the offending records and add the merged ones.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   210
            a = self.diff_main(text_delete, text_insert, False)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   211
            diffs[pointer - count_delete - count_insert : pointer] = a
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   212
            pointer = pointer - count_delete - count_insert + len(a)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   213
          count_insert = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   214
          count_delete = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   215
          text_delete = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   216
          text_insert = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   217
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   218
        pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   219
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   220
      diffs.pop()  # Remove the dummy entry at the end.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   221
    return diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   222
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   223
  def diff_linesToChars(self, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   224
    """Split two texts into an array of strings.  Reduce the texts to a string
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   225
    of hashes where each Unicode character represents one line.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   226
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   227
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   228
      text1: First string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   229
      text2: Second string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   230
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   231
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   232
      Three element tuple, containing the encoded text1, the encoded text2 and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   233
      the array of unique strings.  The zeroth element of the array of unique
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   234
      strings is intentionally blank.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   235
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   236
    lineArray = []  # e.g. lineArray[4] == "Hello\n"
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   237
    lineHash = {}   # e.g. lineHash["Hello\n"] == 4
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   238
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   239
    # "\x00" is a valid character, but various debuggers don't like it.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   240
    # So we'll insert a junk entry to avoid generating a null character.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   241
    lineArray.append('')
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   242
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   243
    def diff_linesToCharsMunge(text):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   244
      """Split a text into an array of strings.  Reduce the texts to a string
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   245
      of hashes where each Unicode character represents one line.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   246
      Modifies linearray and linehash through being a closure.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   247
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   248
      Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   249
        text: String to encode.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   250
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   251
      Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   252
        Encoded string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   253
      """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   254
      chars = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   255
      # Walk the text, pulling out a substring for each line.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   256
      # text.split('\n') would would temporarily double our memory footprint.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   257
      # Modifying text would create many large strings to garbage collect.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   258
      lineStart = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   259
      lineEnd = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   260
      while lineEnd < len(text) - 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   261
        lineEnd = text.find('\n', lineStart)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   262
        if lineEnd == -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   263
          lineEnd = len(text) - 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   264
        line = text[lineStart:lineEnd + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   265
        lineStart = lineEnd + 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   266
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   267
        if line in lineHash:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   268
          chars.append(unichr(lineHash[line]))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   269
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   270
          lineArray.append(line)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   271
          lineHash[line] = len(lineArray) - 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   272
          chars.append(unichr(len(lineArray) - 1))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   273
      return "".join(chars)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   274
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   275
    chars1 = diff_linesToCharsMunge(text1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   276
    chars2 = diff_linesToCharsMunge(text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   277
    return (chars1, chars2, lineArray)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   278
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   279
  def diff_charsToLines(self, diffs, lineArray):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   280
    """Rehydrate the text in a diff from a string of line hashes to real lines
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   281
    of text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   282
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   283
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   284
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   285
      lineArray: Array of unique strings.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   286
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   287
    for x in xrange(len(diffs)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   288
      text = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   289
      for char in diffs[x][1]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   290
        text.append(lineArray[ord(char)])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   291
      diffs[x] = (diffs[x][0], "".join(text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   292
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   293
  def diff_map(self, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   294
    """Explore the intersection points between the two texts.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   295
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   296
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   297
      text1: Old string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   298
      text2: New string to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   299
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   300
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   301
      Array of diff tuples or None if no diff available.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   302
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   303
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   304
    # Unlike in most languages, Python counts time in seconds.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   305
    s_end = time.time() + self.Diff_Timeout  # Don't run for too long.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   306
    # Cache the text lengths to prevent multiple calls.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   307
    text1_length = len(text1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   308
    text2_length = len(text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   309
    max_d = text1_length + text2_length - 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   310
    doubleEnd = self.Diff_DualThreshold * 2 < max_d
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   311
    v_map1 = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   312
    v_map2 = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   313
    v1 = {}
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   314
    v2 = {}
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   315
    v1[1] = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   316
    v2[1] = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   317
    footsteps = {}
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   318
    done = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   319
    # If the total number of characters is odd, then the front path will
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   320
    # collide with the reverse path.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   321
    front = (text1_length + text2_length) % 2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   322
    for d in xrange(max_d):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   323
      # Bail out if timeout reached.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   324
      if self.Diff_Timeout > 0 and time.time() > s_end:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   325
        return None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   326
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   327
      # Walk the front path one step.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   328
      v_map1.append({})
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   329
      for k in xrange(-d, d + 1, 2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   330
        if k == -d or k != d and v1[k - 1] < v1[k + 1]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   331
          x = v1[k + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   332
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   333
          x = v1[k - 1] + 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   334
        y = x - k
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   335
        if doubleEnd:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   336
          footstep = (x, y)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   337
          if front and footstep in footsteps:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   338
            done = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   339
          if not front:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   340
            footsteps[footstep] = d
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   341
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   342
        while (not done and x < text1_length and y < text2_length and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   343
               text1[x] == text2[y]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   344
          x += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   345
          y += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   346
          if doubleEnd:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   347
            footstep = (x, y)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   348
            if front and footstep in footsteps:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   349
              done = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   350
            if not front:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   351
              footsteps[footstep] = d
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   352
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   353
        v1[k] = x
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   354
        v_map1[d][(x, y)] = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   355
        if x == text1_length and y == text2_length:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   356
          # Reached the end in single-path mode.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   357
          return self.diff_path1(v_map1, text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   358
        elif done:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   359
          # Front path ran over reverse path.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   360
          v_map2 = v_map2[:footsteps[footstep] + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   361
          a = self.diff_path1(v_map1, text1[:x], text2[:y])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   362
          b = self.diff_path2(v_map2, text1[x:], text2[y:])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   363
          return a + b
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   364
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   365
      if doubleEnd:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   366
        # Walk the reverse path one step.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   367
        v_map2.append({})
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   368
        for k in xrange(-d, d + 1, 2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   369
          if k == -d or k != d and v2[k - 1] < v2[k + 1]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   370
            x = v2[k + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   371
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   372
            x = v2[k - 1] + 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   373
          y = x - k
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   374
          footstep = (text1_length - x, text2_length - y)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   375
          if not front and footstep in footsteps:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   376
            done = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   377
          if front:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   378
            footsteps[footstep] = d
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   379
          while (not done and x < text1_length and y < text2_length and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   380
                 text1[-x - 1] == text2[-y - 1]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   381
            x += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   382
            y += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   383
            footstep = (text1_length - x, text2_length - y)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   384
            if not front and footstep in footsteps:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   385
              done = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   386
            if front:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   387
              footsteps[footstep] = d
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   388
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   389
          v2[k] = x
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   390
          v_map2[d][(x, y)] = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   391
          if done:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   392
            # Reverse path ran over front path.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   393
            v_map1 = v_map1[:footsteps[footstep] + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   394
            a = self.diff_path1(v_map1, text1[:text1_length - x],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   395
                                text2[:text2_length - y])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   396
            b = self.diff_path2(v_map2, text1[text1_length - x:],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   397
                                text2[text2_length - y:])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   398
            return a + b
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   399
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   400
    # Number of diffs equals number of characters, no commonality at all.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   401
    return None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   402
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   403
  def diff_path1(self, v_map, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   404
    """Work from the middle back to the start to determine the path.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   405
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   406
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   407
      v_map: Array of paths.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   408
      text1: Old string fragment to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   409
      text2: New string fragment to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   410
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   411
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   412
      Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   413
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   414
    path = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   415
    x = len(text1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   416
    y = len(text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   417
    last_op = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   418
    for d in xrange(len(v_map) - 2, -1, -1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   419
      while True:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   420
        if (x - 1, y) in v_map[d]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   421
          x -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   422
          if last_op == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   423
            path[0] = (self.DIFF_DELETE, text1[x] + path[0][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   424
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   425
            path[:0] = [(self.DIFF_DELETE, text1[x])]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   426
          last_op = self.DIFF_DELETE
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   427
          break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   428
        elif (x, y - 1) in v_map[d]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   429
          y -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   430
          if last_op == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   431
            path[0] = (self.DIFF_INSERT, text2[y] + path[0][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   432
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   433
            path[:0] = [(self.DIFF_INSERT, text2[y])]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   434
          last_op = self.DIFF_INSERT
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   435
          break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   436
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   437
          x -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   438
          y -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   439
          assert text1[x] == text2[y], ("No diagonal.  " +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   440
              "Can't happen. (diff_path1)")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   441
          if last_op == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   442
            path[0] = (self.DIFF_EQUAL, text1[x] + path[0][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   443
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   444
            path[:0] = [(self.DIFF_EQUAL, text1[x])]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   445
          last_op = self.DIFF_EQUAL
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   446
    return path
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   447
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   448
  def diff_path2(self, v_map, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   449
    """Work from the middle back to the end to determine the path.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   450
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   451
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   452
      v_map: Array of paths.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   453
      text1: Old string fragment to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   454
      text2: New string fragment to be diffed.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   455
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   456
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   457
      Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   458
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   459
    path = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   460
    x = len(text1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   461
    y = len(text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   462
    last_op = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   463
    for d in xrange(len(v_map) - 2, -1, -1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   464
      while True:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   465
        if (x - 1, y) in v_map[d]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   466
          x -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   467
          if last_op == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   468
            path[-1] = (self.DIFF_DELETE, path[-1][1] + text1[-x - 1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   469
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   470
            path.append((self.DIFF_DELETE, text1[-x - 1]))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   471
          last_op = self.DIFF_DELETE
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   472
          break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   473
        elif (x, y - 1) in v_map[d]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   474
          y -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   475
          if last_op == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   476
            path[-1] = (self.DIFF_INSERT, path[-1][1] + text2[-y - 1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   477
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   478
            path.append((self.DIFF_INSERT, text2[-y - 1]))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   479
          last_op = self.DIFF_INSERT
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   480
          break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   481
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   482
          x -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   483
          y -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   484
          assert text1[-x - 1] == text2[-y - 1], ("No diagonal.  " +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   485
              "Can't happen. (diff_path2)")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   486
          if last_op == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   487
            path[-1] = (self.DIFF_EQUAL, path[-1][1] + text1[-x - 1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   488
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   489
            path.append((self.DIFF_EQUAL, text1[-x - 1]))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   490
          last_op = self.DIFF_EQUAL
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   491
    return path
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   492
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   493
  def diff_commonPrefix(self, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   494
    """Determine the common prefix of two strings.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   495
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   496
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   497
      text1: First string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   498
      text2: Second string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   499
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   500
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   501
      The number of characters common to the start of each string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   502
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   503
    # Quick check for common null cases.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   504
    if not text1 or not text2 or text1[0] != text2[0]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   505
      return 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   506
    # Binary search.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   507
    # Performance analysis: http://neil.fraser.name/news/2007/10/09/
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   508
    pointermin = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   509
    pointermax = min(len(text1), len(text2))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   510
    pointermid = pointermax
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   511
    pointerstart = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   512
    while pointermin < pointermid:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   513
      if text1[pointerstart:pointermid] == text2[pointerstart:pointermid]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   514
        pointermin = pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   515
        pointerstart = pointermin
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   516
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   517
        pointermax = pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   518
      pointermid = int((pointermax - pointermin) / 2 + pointermin)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   519
    return pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   520
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   521
  def diff_commonSuffix(self, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   522
    """Determine the common suffix of two strings.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   523
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   524
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   525
      text1: First string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   526
      text2: Second string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   527
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   528
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   529
      The number of characters common to the end of each string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   530
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   531
    # Quick check for common null cases.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   532
    if not text1 or not text2 or text1[-1] != text2[-1]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   533
      return 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   534
    # Binary search.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   535
    # Performance analysis: http://neil.fraser.name/news/2007/10/09/
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   536
    pointermin = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   537
    pointermax = min(len(text1), len(text2))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   538
    pointermid = pointermax
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   539
    pointerend = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   540
    while pointermin < pointermid:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   541
      if (text1[-pointermid:len(text1) - pointerend] ==
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   542
          text2[-pointermid:len(text2) - pointerend]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   543
        pointermin = pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   544
        pointerend = pointermin
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   545
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   546
        pointermax = pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   547
      pointermid = int((pointermax - pointermin) / 2 + pointermin)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   548
    return pointermid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   549
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   550
  def diff_halfMatch(self, text1, text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   551
    """Do the two texts share a substring which is at least half the length of
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   552
    the longer text?
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   553
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   554
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   555
      text1: First string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   556
      text2: Second string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   557
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   558
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   559
      Five element Array, containing the prefix of text1, the suffix of text1,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   560
      the prefix of text2, the suffix of text2 and the common middle.  Or None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   561
      if there was no match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   562
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   563
    if len(text1) > len(text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   564
      (longtext, shorttext) = (text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   565
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   566
      (shorttext, longtext) = (text1, text2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   567
    if len(longtext) < 10 or len(shorttext) < 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   568
      return None  # Pointless.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   569
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   570
    def diff_halfMatchI(longtext, shorttext, i):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   571
      """Does a substring of shorttext exist within longtext such that the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   572
      substring is at least half the length of longtext?
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   573
      Closure, but does not reference any external variables.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   574
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   575
      Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   576
        longtext: Longer string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   577
        shorttext: Shorter string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   578
        i: Start index of quarter length substring within longtext.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   579
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   580
      Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   581
        Five element Array, containing the prefix of longtext, the suffix of
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   582
        longtext, the prefix of shorttext, the suffix of shorttext and the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   583
        common middle.  Or None if there was no match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   584
      """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   585
      seed = longtext[i:i + len(longtext) / 4]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   586
      best_common = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   587
      j = shorttext.find(seed)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   588
      while j != -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   589
        prefixLength = self.diff_commonPrefix(longtext[i:], shorttext[j:])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   590
        suffixLength = self.diff_commonSuffix(longtext[:i], shorttext[:j])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   591
        if len(best_common) < suffixLength + prefixLength:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   592
          best_common = (shorttext[j - suffixLength:j] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   593
              shorttext[j:j + prefixLength])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   594
          best_longtext_a = longtext[:i - suffixLength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   595
          best_longtext_b = longtext[i + prefixLength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   596
          best_shorttext_a = shorttext[:j - suffixLength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   597
          best_shorttext_b = shorttext[j + prefixLength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   598
        j = shorttext.find(seed, j + 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   599
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   600
      if len(best_common) >= len(longtext) / 2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   601
        return (best_longtext_a, best_longtext_b,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   602
                best_shorttext_a, best_shorttext_b, best_common)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   603
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   604
        return None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   605
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   606
    # First check if the second quarter is the seed for a half-match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   607
    hm1 = diff_halfMatchI(longtext, shorttext, (len(longtext) + 3) / 4)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   608
    # Check again based on the third quarter.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   609
    hm2 = diff_halfMatchI(longtext, shorttext, (len(longtext) + 1) / 2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   610
    if not hm1 and not hm2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   611
      return None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   612
    elif not hm2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   613
      hm = hm1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   614
    elif not hm1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   615
      hm = hm2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   616
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   617
      # Both matched.  Select the longest.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   618
      if len(hm1[4]) > len(hm2[4]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   619
        hm = hm1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   620
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   621
        hm = hm2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   622
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   623
    # A half-match was found, sort out the return data.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   624
    if len(text1) > len(text2):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   625
      (text1_a, text1_b, text2_a, text2_b, mid_common) = hm
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   626
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   627
      (text2_a, text2_b, text1_a, text1_b, mid_common) = hm
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   628
    return (text1_a, text1_b, text2_a, text2_b, mid_common)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   629
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   630
  def diff_cleanupSemantic(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   631
    """Reduce the number of edits by eliminating semantically trivial
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   632
    equalities.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   633
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   634
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   635
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   636
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   637
    changes = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   638
    equalities = []  # Stack of indices where equalities are found.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   639
    lastequality = None  # Always equal to equalities[-1][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   640
    pointer = 0  # Index of current position.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   641
    length_changes1 = 0  # Number of chars that changed prior to the equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   642
    length_changes2 = 0  # Number of chars that changed after the equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   643
    while pointer < len(diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   644
      if diffs[pointer][0] == self.DIFF_EQUAL:  # equality found
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   645
        equalities.append(pointer)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   646
        length_changes1 = length_changes2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   647
        length_changes2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   648
        lastequality = diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   649
      else:  # an insertion or deletion
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   650
        length_changes2 += len(diffs[pointer][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   651
        if (lastequality != None and (len(lastequality) <= length_changes1) and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   652
            (len(lastequality) <= length_changes2)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   653
          # Duplicate record
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   654
          diffs.insert(equalities[-1], (self.DIFF_DELETE, lastequality))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   655
          # Change second copy to insert.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   656
          diffs[equalities[-1] + 1] = (self.DIFF_INSERT,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   657
              diffs[equalities[-1] + 1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   658
          # Throw away the equality we just deleted.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   659
          equalities.pop()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   660
          # Throw away the previous equality (it needs to be reevaluated).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   661
          if len(equalities) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   662
            equalities.pop()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   663
          if len(equalities):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   664
            pointer = equalities[-1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   665
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   666
            pointer = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   667
          length_changes1 = 0  # Reset the counters.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   668
          length_changes2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   669
          lastequality = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   670
          changes = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   671
      pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   672
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   673
    if changes:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   674
      self.diff_cleanupMerge(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   675
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   676
    self.diff_cleanupSemanticLossless(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   677
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   678
  def diff_cleanupSemanticLossless(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   679
    """Look for single edits surrounded on both sides by equalities
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   680
    which can be shifted sideways to align the edit to a word boundary.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   681
    e.g: The c<ins>at c</ins>ame. -> The <ins>cat </ins>came.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   682
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   683
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   684
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   685
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   686
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   687
    def diff_cleanupSemanticScore(one, two):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   688
      """Given two strings, compute a score representing whether the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   689
      internal boundary falls on logical boundaries.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   690
      Scores range from 5 (best) to 0 (worst).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   691
      Closure, but does not reference any external variables.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   692
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   693
      Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   694
        one: First string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   695
        two: Second string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   696
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   697
      Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   698
        The score.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   699
      """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   700
      if not one or not two:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   701
        # Edges are the best.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   702
        return 5
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   703
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   704
      # Each port of this function behaves slightly differently due to
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   705
      # subtle differences in each language's definition of things like
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   706
      # 'whitespace'.  Since this function's purpose is largely cosmetic,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   707
      # the choice has been made to use each language's native features
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   708
      # rather than force total conformity.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   709
      score = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   710
      # One point for non-alphanumeric.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   711
      if not one[-1].isalnum() or not two[0].isalnum():
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   712
        score += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   713
        # Two points for whitespace.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   714
        if one[-1].isspace() or two[0].isspace():
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   715
          score += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   716
          # Three points for line breaks.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   717
          if (one[-1] == "\r" or one[-1] == "\n" or
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   718
              two[0] == "\r" or two[0] == "\n"):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   719
            score += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   720
            # Four points for blank lines.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   721
            if (re.search("\\n\\r?\\n$", one) or
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   722
                re.match("^\\r?\\n\\r?\\n", two)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   723
              score += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   724
      return score
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   725
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   726
    pointer = 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   727
    # Intentionally ignore the first and last element (don't need checking).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   728
    while pointer < len(diffs) - 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   729
      if (diffs[pointer - 1][0] == self.DIFF_EQUAL and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   730
          diffs[pointer + 1][0] == self.DIFF_EQUAL):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   731
        # This is a single edit surrounded by equalities.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   732
        equality1 = diffs[pointer - 1][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   733
        edit = diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   734
        equality2 = diffs[pointer + 1][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   735
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   736
        # First, shift the edit as far left as possible.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   737
        commonOffset = self.diff_commonSuffix(equality1, edit)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   738
        if commonOffset:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   739
          commonString = edit[-commonOffset:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   740
          equality1 = equality1[:-commonOffset]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   741
          edit = commonString + edit[:-commonOffset]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   742
          equality2 = commonString + equality2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   743
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   744
        # Second, step character by character right, looking for the best fit.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   745
        bestEquality1 = equality1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   746
        bestEdit = edit
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   747
        bestEquality2 = equality2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   748
        bestScore = (diff_cleanupSemanticScore(equality1, edit) +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   749
            diff_cleanupSemanticScore(edit, equality2))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   750
        while edit and equality2 and edit[0] == equality2[0]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   751
          equality1 += edit[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   752
          edit = edit[1:] + equality2[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   753
          equality2 = equality2[1:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   754
          score = (diff_cleanupSemanticScore(equality1, edit) +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   755
              diff_cleanupSemanticScore(edit, equality2))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   756
          # The >= encourages trailing rather than leading whitespace on edits.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   757
          if score >= bestScore:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   758
            bestScore = score
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   759
            bestEquality1 = equality1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   760
            bestEdit = edit
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   761
            bestEquality2 = equality2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   762
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   763
        if diffs[pointer - 1][1] != bestEquality1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   764
          # We have an improvement, save it back to the diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   765
          if bestEquality1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   766
            diffs[pointer - 1] = (diffs[pointer - 1][0], bestEquality1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   767
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   768
            del diffs[pointer - 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   769
            pointer -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   770
          diffs[pointer] = (diffs[pointer][0], bestEdit)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   771
          if bestEquality2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   772
            diffs[pointer + 1] = (diffs[pointer + 1][0], bestEquality2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   773
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   774
            del diffs[pointer + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   775
            pointer -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   776
      pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   777
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   778
  def diff_cleanupEfficiency(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   779
    """Reduce the number of edits by eliminating operationally trivial
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   780
    equalities.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   781
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   782
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   783
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   784
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   785
    changes = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   786
    equalities = []  # Stack of indices where equalities are found.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   787
    lastequality = ''  # Always equal to equalities[-1][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   788
    pointer = 0  # Index of current position.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   789
    pre_ins = False  # Is there an insertion operation before the last equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   790
    pre_del = False  # Is there a deletion operation before the last equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   791
    post_ins = False  # Is there an insertion operation after the last equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   792
    post_del = False  # Is there a deletion operation after the last equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   793
    while pointer < len(diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   794
      if diffs[pointer][0] == self.DIFF_EQUAL:  # equality found
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   795
        if (len(diffs[pointer][1]) < self.Diff_EditCost and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   796
            (post_ins or post_del)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   797
          # Candidate found.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   798
          equalities.append(pointer)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   799
          pre_ins = post_ins
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   800
          pre_del = post_del
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   801
          lastequality = diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   802
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   803
          # Not a candidate, and can never become one.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   804
          equalities = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   805
          lastequality = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   806
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   807
        post_ins = post_del = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   808
      else:  # an insertion or deletion
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   809
        if diffs[pointer][0] == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   810
          post_del = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   811
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   812
          post_ins = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   813
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   814
        # Five types to be split:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   815
        # <ins>A</ins><del>B</del>XY<ins>C</ins><del>D</del>
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   816
        # <ins>A</ins>X<ins>C</ins><del>D</del>
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   817
        # <ins>A</ins><del>B</del>X<ins>C</ins>
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   818
        # <ins>A</del>X<ins>C</ins><del>D</del>
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   819
        # <ins>A</ins><del>B</del>X<del>C</del>
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   820
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   821
        if lastequality and ((pre_ins and pre_del and post_ins and post_del) or
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   822
                             ((len(lastequality) < self.Diff_EditCost / 2) and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   823
                              (pre_ins + pre_del + post_ins + post_del) == 3)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   824
          # Duplicate record
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   825
          diffs.insert(equalities[-1], (self.DIFF_DELETE, lastequality))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   826
          # Change second copy to insert.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   827
          diffs[equalities[-1] + 1] = (self.DIFF_INSERT,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   828
              diffs[equalities[-1] + 1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   829
          equalities.pop()  # Throw away the equality we just deleted
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   830
          lastequality = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   831
          if pre_ins and pre_del:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   832
            # No changes made which could affect previous entry, keep going.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   833
            post_ins = post_del = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   834
            equalities = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   835
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   836
            if len(equalities):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   837
              equalities.pop()  # Throw away the previous equality
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   838
            if len(equalities):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   839
              pointer = equalities[-1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   840
            else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   841
              pointer = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   842
            post_ins = post_del = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   843
          changes = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   844
      pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   845
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   846
    if changes:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   847
      self.diff_cleanupMerge(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   848
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   849
  def diff_cleanupMerge(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   850
    """Reorder and merge like edit sections.  Merge equalities.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   851
    Any edit section can move as long as it doesn't cross an equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   852
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   853
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   854
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   855
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   856
    diffs.append((self.DIFF_EQUAL, ''))  # Add a dummy entry at the end.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   857
    pointer = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   858
    count_delete = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   859
    count_insert = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   860
    text_delete = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   861
    text_insert = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   862
    while pointer < len(diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   863
      if diffs[pointer][0] == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   864
        count_insert += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   865
        text_insert += diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   866
        pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   867
      elif diffs[pointer][0] == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   868
        count_delete += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   869
        text_delete += diffs[pointer][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   870
        pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   871
      elif diffs[pointer][0] == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   872
        # Upon reaching an equality, check for prior redundancies.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   873
        if count_delete != 0 or count_insert != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   874
          if count_delete != 0 and count_insert != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   875
            # Factor out any common prefixies.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   876
            commonlength = self.diff_commonPrefix(text_insert, text_delete)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   877
            if commonlength != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   878
              x = pointer - count_delete - count_insert - 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   879
              if x >= 0 and diffs[x][0] == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   880
                diffs[x] = (diffs[x][0], diffs[x][1] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   881
                            text_insert[:commonlength])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   882
              else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   883
                diffs.insert(0, (self.DIFF_EQUAL, text_insert[:commonlength]))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   884
                pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   885
              text_insert = text_insert[commonlength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   886
              text_delete = text_delete[commonlength:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   887
            # Factor out any common suffixies.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   888
            commonlength = self.diff_commonSuffix(text_insert, text_delete)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   889
            if commonlength != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   890
              diffs[pointer] = (diffs[pointer][0], text_insert[-commonlength:] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   891
                  diffs[pointer][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   892
              text_insert = text_insert[:-commonlength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   893
              text_delete = text_delete[:-commonlength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   894
          # Delete the offending records and add the merged ones.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   895
          if count_delete == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   896
            diffs[pointer - count_insert : pointer] = [
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   897
                (self.DIFF_INSERT, text_insert)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   898
          elif count_insert == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   899
            diffs[pointer - count_delete : pointer] = [
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   900
                (self.DIFF_DELETE, text_delete)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   901
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   902
            diffs[pointer - count_delete - count_insert : pointer] = [
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   903
                (self.DIFF_DELETE, text_delete),
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   904
                (self.DIFF_INSERT, text_insert)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   905
          pointer = pointer - count_delete - count_insert + 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   906
          if count_delete != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   907
            pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   908
          if count_insert != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   909
            pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   910
        elif pointer != 0 and diffs[pointer - 1][0] == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   911
          # Merge this equality with the previous one.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   912
          diffs[pointer - 1] = (diffs[pointer - 1][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   913
                                diffs[pointer - 1][1] + diffs[pointer][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   914
          del diffs[pointer]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   915
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   916
          pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   917
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   918
        count_insert = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   919
        count_delete = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   920
        text_delete = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   921
        text_insert = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   922
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   923
    if diffs[-1][1] == '':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   924
      diffs.pop()  # Remove the dummy entry at the end.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   925
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   926
    # Second pass: look for single edits surrounded on both sides by equalities
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   927
    # which can be shifted sideways to eliminate an equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   928
    # e.g: A<ins>BA</ins>C -> <ins>AB</ins>AC
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   929
    changes = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   930
    pointer = 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   931
    # Intentionally ignore the first and last element (don't need checking).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   932
    while pointer < len(diffs) - 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   933
      if (diffs[pointer - 1][0] == self.DIFF_EQUAL and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   934
          diffs[pointer + 1][0] == self.DIFF_EQUAL):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   935
        # This is a single edit surrounded by equalities.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   936
        if diffs[pointer][1].endswith(diffs[pointer - 1][1]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   937
          # Shift the edit over the previous equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   938
          diffs[pointer] = (diffs[pointer][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   939
              diffs[pointer - 1][1] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   940
              diffs[pointer][1][:-len(diffs[pointer - 1][1])])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   941
          diffs[pointer + 1] = (diffs[pointer + 1][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   942
                                diffs[pointer - 1][1] + diffs[pointer + 1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   943
          del diffs[pointer - 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   944
          changes = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   945
        elif diffs[pointer][1].startswith(diffs[pointer + 1][1]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   946
          # Shift the edit over the next equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   947
          diffs[pointer - 1] = (diffs[pointer - 1][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   948
                                diffs[pointer - 1][1] + diffs[pointer + 1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   949
          diffs[pointer] = (diffs[pointer][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   950
              diffs[pointer][1][len(diffs[pointer + 1][1]):] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   951
              diffs[pointer + 1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   952
          del diffs[pointer + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   953
          changes = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   954
      pointer += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   955
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   956
    # If shifts were made, the diff needs reordering and another shift sweep.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   957
    if changes:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   958
      self.diff_cleanupMerge(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   959
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   960
  def diff_xIndex(self, diffs, loc):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   961
    """loc is a location in text1, compute and return the equivalent location
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   962
    in text2.  e.g. "The cat" vs "The big cat", 1->1, 5->8
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   963
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   964
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   965
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   966
      loc: Location within text1.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   967
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   968
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   969
      Location within text2.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   970
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   971
    chars1 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   972
    chars2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   973
    last_chars1 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   974
    last_chars2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   975
    for x in xrange(len(diffs)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   976
      (op, text) = diffs[x]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   977
      if op != self.DIFF_INSERT:  # Equality or deletion.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   978
        chars1 += len(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   979
      if op != self.DIFF_DELETE:  # Equality or insertion.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   980
        chars2 += len(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   981
      if chars1 > loc:  # Overshot the location.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   982
        break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   983
      last_chars1 = chars1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   984
      last_chars2 = chars2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   985
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   986
    if len(diffs) != x and diffs[x][0] == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   987
      # The location was deleted.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   988
      return last_chars2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   989
    # Add the remaining len(character).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   990
    return last_chars2 + (loc - last_chars1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   991
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   992
  def diff_prettyHtml(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   993
    """Convert a diff array into a pretty HTML report.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   994
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   995
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   996
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   997
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   998
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
   999
      HTML representation.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1000
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1001
    html = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1002
    i = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1003
    for (op, data) in diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1004
      text = (data.replace("&", "&amp;").replace("<", "&lt;")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1005
                 .replace(">", "&gt;").replace("\n", "&para;<BR>"))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1006
      if op == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1007
        html.append("<INS STYLE=\"background:#E6FFE6;\" TITLE=\"i=%i\">%s</INS>"
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1008
            % (i, text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1009
      elif op == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1010
        html.append("<DEL STYLE=\"background:#FFE6E6;\" TITLE=\"i=%i\">%s</DEL>"
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1011
            % (i, text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1012
      elif op == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1013
        html.append("<SPAN TITLE=\"i=%i\">%s</SPAN>" % (i, text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1014
      if op != self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1015
        i += len(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1016
    return "".join(html)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1017
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1018
  def diff_text1(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1019
    """Compute and return the source text (all equalities and deletions).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1020
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1021
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1022
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1023
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1024
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1025
      Source text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1026
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1027
    text = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1028
    for (op, data) in diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1029
      if op != self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1030
        text.append(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1031
    return "".join(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1032
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1033
  def diff_text2(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1034
    """Compute and return the destination text (all equalities and insertions).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1035
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1036
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1037
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1038
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1039
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1040
      Destination text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1041
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1042
    text = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1043
    for (op, data) in diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1044
      if op != self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1045
        text.append(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1046
    return "".join(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1047
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1048
  def diff_levenshtein(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1049
    """Compute the Levenshtein distance; the number of inserted, deleted or
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1050
    substituted characters.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1051
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1052
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1053
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1054
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1055
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1056
      Number of changes.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1057
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1058
    levenshtein = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1059
    insertions = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1060
    deletions = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1061
    for (op, data) in diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1062
      if op == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1063
        insertions += len(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1064
      elif op == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1065
        deletions += len(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1066
      elif op == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1067
        # A deletion and an insertion is one substitution.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1068
        levenshtein += max(insertions, deletions)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1069
        insertions = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1070
        deletions = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1071
    levenshtein += max(insertions, deletions)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1072
    return levenshtein
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1073
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1074
  def diff_toDelta(self, diffs):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1075
    """Crush the diff into an encoded string which describes the operations
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1076
    required to transform text1 into text2.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1077
    E.g. =3\t-2\t+ing  -> Keep 3 chars, delete 2 chars, insert 'ing'.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1078
    Operations are tab-separated.  Inserted text is escaped using %xx notation.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1079
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1080
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1081
      diffs: Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1082
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1083
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1084
      Delta text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1085
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1086
    text = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1087
    for (op, data) in diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1088
      if op == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1089
        # High ascii will raise UnicodeDecodeError.  Use Unicode instead.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1090
        data = data.encode("utf-8")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1091
        text.append("+" + urllib.quote(data, "!~*'();/?:@&=+$,# "))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1092
      elif op == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1093
        text.append("-%d" % len(data))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1094
      elif op == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1095
        text.append("=%d" % len(data))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1096
    return "\t".join(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1097
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1098
  def diff_fromDelta(self, text1, delta):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1099
    """Given the original text1, and an encoded string which describes the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1100
    operations required to transform text1 into text2, compute the full diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1101
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1102
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1103
      text1: Source string for the diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1104
      delta: Delta text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1105
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1106
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1107
      Array of diff tuples.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1108
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1109
    Raises:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1110
      ValueError: If invalid input.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1111
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1112
    if type(delta) == unicode:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1113
      # Deltas should be composed of a subset of ascii chars, Unicode not
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1114
      # required.  If this encode raises UnicodeEncodeError, delta is invalid.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1115
      delta = delta.encode("ascii")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1116
    diffs = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1117
    pointer = 0  # Cursor in text1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1118
    tokens = delta.split("\t")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1119
    for token in tokens:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1120
      if token == "":
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1121
        # Blank tokens are ok (from a trailing \t).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1122
        continue
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1123
      # Each token begins with a one character parameter which specifies the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1124
      # operation of this token (delete, insert, equality).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1125
      param = token[1:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1126
      if token[0] == "+":
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1127
        param = urllib.unquote(param).decode("utf-8")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1128
        diffs.append((self.DIFF_INSERT, param))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1129
      elif token[0] == "-" or token[0] == "=":
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1130
        try:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1131
          n = int(param)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1132
        except ValueError:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1133
          raise ValueError, "Invalid number in diff_fromDelta: " + param
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1134
        if n < 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1135
          raise ValueError, "Negative number in diff_fromDelta: " + param
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1136
        text = text1[pointer : pointer + n]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1137
        pointer += n
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1138
        if token[0] == "=":
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1139
          diffs.append((self.DIFF_EQUAL, text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1140
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1141
          diffs.append((self.DIFF_DELETE, text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1142
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1143
        # Anything else is an error.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1144
        raise ValueError, ("Invalid diff operation in diff_fromDelta: " +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1145
            token[0])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1146
    if pointer != len(text1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1147
      raise ValueError, (
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1148
          "Delta length (%d) does not equal source text length (%d)." %
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1149
         (pointer, len(text1)))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1150
    return diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1151
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1152
  #  MATCH FUNCTIONS
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1153
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1154
  def match_main(self, text, pattern, loc):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1155
    """Locate the best instance of 'pattern' in 'text' near 'loc'.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1156
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1157
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1158
      text: The text to search.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1159
      pattern: The pattern to search for.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1160
      loc: The location to search around.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1161
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1162
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1163
      Best match index or -1.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1164
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1165
    loc = max(0, min(loc, len(text)))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1166
    if text == pattern:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1167
      # Shortcut (potentially not guaranteed by the algorithm)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1168
      return 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1169
    elif not text:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1170
      # Nothing to match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1171
      return -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1172
    elif text[loc:loc + len(pattern)] == pattern:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1173
      # Perfect match at the perfect spot!  (Includes case of null pattern)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1174
      return loc
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1175
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1176
      # Do a fuzzy compare.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1177
      match = self.match_bitap(text, pattern, loc)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1178
      return match
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1179
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1180
  def match_bitap(self, text, pattern, loc):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1181
    """Locate the best instance of 'pattern' in 'text' near 'loc' using the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1182
    Bitap algorithm.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1183
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1184
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1185
      text: The text to search.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1186
      pattern: The pattern to search for.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1187
      loc: The location to search around.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1188
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1189
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1190
      Best match index or -1.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1191
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1192
    # Python doesn't have a maxint limit, so ignore this check.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1193
    #if self.Match_MaxBits != 0 and len(pattern) > self.Match_MaxBits:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1194
    #  raise ValueError("Pattern too long for this application.")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1195
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1196
    # Initialise the alphabet.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1197
    s = self.match_alphabet(pattern)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1198
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1199
    def match_bitapScore(e, x):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1200
      """Compute and return the score for a match with e errors and x location.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1201
      Accesses loc and pattern through being a closure.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1202
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1203
      Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1204
        e: Number of errors in match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1205
        x: Location of match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1206
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1207
      Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1208
        Overall score for match (0.0 = good, 1.0 = bad).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1209
      """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1210
      accuracy = float(e) / len(pattern)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1211
      proximity = abs(loc - x)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1212
      if not self.Match_Distance:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1213
        # Dodge divide by zero error.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1214
        return proximity and 1.0 or accuracy
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1215
      return accuracy + (proximity / float(self.Match_Distance))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1216
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1217
    # Highest score beyond which we give up.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1218
    score_threshold = self.Match_Threshold
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1219
    # Is there a nearby exact match? (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1220
    best_loc = text.find(pattern, loc)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1221
    if best_loc != -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1222
      score_threshold = min(match_bitapScore(0, best_loc), score_threshold)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1223
      # What about in the other direction? (speedup)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1224
      best_loc = text.rfind(pattern, loc + len(pattern))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1225
      if best_loc != -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1226
        score_threshold = min(match_bitapScore(0, best_loc), score_threshold)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1227
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1228
    # Initialise the bit arrays.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1229
    matchmask = 1 << (len(pattern) - 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1230
    best_loc = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1231
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1232
    bin_max = len(pattern) + len(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1233
    # Empty initialization added to appease pychecker.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1234
    last_rd = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1235
    for d in xrange(len(pattern)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1236
      # Scan for the best match each iteration allows for one more error.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1237
      # Run a binary search to determine how far from 'loc' we can stray at
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1238
      # this error level.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1239
      bin_min = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1240
      bin_mid = bin_max
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1241
      while bin_min < bin_mid:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1242
        if match_bitapScore(d, loc + bin_mid) <= score_threshold:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1243
          bin_min = bin_mid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1244
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1245
          bin_max = bin_mid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1246
        bin_mid = (bin_max - bin_min) / 2 + bin_min
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1247
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1248
      # Use the result from this iteration as the maximum for the next.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1249
      bin_max = bin_mid
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1250
      start = max(1, loc - bin_mid + 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1251
      finish = min(loc + bin_mid, len(text)) + len(pattern)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1252
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1253
      rd = range(finish + 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1254
      rd.append((1 << d) - 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1255
      for j in xrange(finish, start - 1, -1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1256
        if len(text) <= j - 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1257
          # Out of range.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1258
          charMatch = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1259
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1260
          charMatch = s.get(text[j - 1], 0)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1261
        if d == 0:  # First pass: exact match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1262
          rd[j] = ((rd[j + 1] << 1) | 1) & charMatch
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1263
        else:  # Subsequent passes: fuzzy match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1264
          rd[j] = ((rd[j + 1] << 1) | 1) & charMatch | (
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1265
              ((last_rd[j + 1] | last_rd[j]) << 1) | 1) | last_rd[j + 1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1266
        if rd[j] & matchmask:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1267
          score = match_bitapScore(d, j - 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1268
          # This match will almost certainly be better than any existing match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1269
          # But check anyway.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1270
          if score <= score_threshold:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1271
            # Told you so.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1272
            score_threshold = score
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1273
            best_loc = j - 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1274
            if best_loc > loc:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1275
              # When passing loc, don't exceed our current distance from loc.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1276
              start = max(1, 2 * loc - best_loc)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1277
            else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1278
              # Already passed loc, downhill from here on in.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1279
              break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1280
      # No hope for a (better) match at greater error levels.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1281
      if match_bitapScore(d + 1, loc) > score_threshold:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1282
        break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1283
      last_rd = rd
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1284
    return best_loc
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1285
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1286
  def match_alphabet(self, pattern):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1287
    """Initialise the alphabet for the Bitap algorithm.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1288
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1289
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1290
      pattern: The text to encode.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1291
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1292
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1293
      Hash of character locations.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1294
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1295
    s = {}
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1296
    for char in pattern:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1297
      s[char] = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1298
    for i in xrange(len(pattern)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1299
      s[pattern[i]] |= 1 << (len(pattern) - i - 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1300
    return s
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1301
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1302
  #  PATCH FUNCTIONS
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1303
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1304
  def patch_addContext(self, patch, text):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1305
    """Increase the context until it is unique,
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1306
    but don't let the pattern expand beyond Match_MaxBits.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1307
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1308
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1309
      patch: The patch to grow.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1310
      text: Source text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1311
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1312
    if len(text) == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1313
      return
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1314
    pattern = text[patch.start2 : patch.start2 + patch.length1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1315
    padding = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1316
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1317
    # Look for the first and last matches of pattern in text.  If two different
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1318
    # matches are found, increase the pattern length.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1319
    while (text.find(pattern) != text.rfind(pattern) and (self.Match_MaxBits ==
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1320
        0 or len(pattern) < self.Match_MaxBits - self.Patch_Margin -
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1321
        self.Patch_Margin)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1322
      padding += self.Patch_Margin
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1323
      pattern = text[max(0, patch.start2 - padding) :
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1324
                     patch.start2 + patch.length1 + padding]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1325
    # Add one chunk for good luck.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1326
    padding += self.Patch_Margin
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1327
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1328
    # Add the prefix.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1329
    prefix = text[max(0, patch.start2 - padding) : patch.start2]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1330
    if prefix:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1331
      patch.diffs[:0] = [(self.DIFF_EQUAL, prefix)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1332
    # Add the suffix.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1333
    suffix = text[patch.start2 + patch.length1 :
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1334
                  patch.start2 + patch.length1 + padding]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1335
    if suffix:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1336
      patch.diffs.append((self.DIFF_EQUAL, suffix))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1337
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1338
    # Roll back the start points.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1339
    patch.start1 -= len(prefix)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1340
    patch.start2 -= len(prefix)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1341
    # Extend lengths.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1342
    patch.length1 += len(prefix) + len(suffix)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1343
    patch.length2 += len(prefix) + len(suffix)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1344
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1345
  def patch_make(self, a, b=None, c=None):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1346
    """Compute a list of patches to turn text1 into text2.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1347
    Use diffs if provided, otherwise compute it ourselves.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1348
    There are four ways to call this function, depending on what data is
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1349
    available to the caller:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1350
    Method 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1351
    a = text1, b = text2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1352
    Method 2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1353
    a = diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1354
    Method 3 (optimal):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1355
    a = text1, b = diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1356
    Method 4 (deprecated, use method 3):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1357
    a = text1, b = text2, c = diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1358
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1359
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1360
      a: text1 (methods 1,3,4) or Array of diff tuples for text1 to
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1361
          text2 (method 2).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1362
      b: text2 (methods 1,4) or Array of diff tuples for text1 to
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1363
          text2 (method 3) or undefined (method 2).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1364
      c: Array of diff tuples for text1 to text2 (method 4) or
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1365
          undefined (methods 1,2,3).
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1366
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1367
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1368
      Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1369
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1370
    text1 = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1371
    diffs = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1372
    # Note that texts may arrive as 'str' or 'unicode'.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1373
    if isinstance(a, basestring) and isinstance(b, basestring) and c is None:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1374
      # Method 1: text1, text2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1375
      # Compute diffs from text1 and text2.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1376
      text1 = a
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1377
      diffs = self.diff_main(text1, b, True)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1378
      if len(diffs) > 2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1379
        self.diff_cleanupSemantic(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1380
        self.diff_cleanupEfficiency(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1381
    elif isinstance(a, list) and b is None and c is None:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1382
      # Method 2: diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1383
      # Compute text1 from diffs.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1384
      diffs = a
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1385
      text1 = self.diff_text1(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1386
    elif isinstance(a, basestring) and isinstance(b, list) and c is None:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1387
      # Method 3: text1, diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1388
      text1 = a
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1389
      diffs = b
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1390
    elif (isinstance(a, basestring) and isinstance(b, basestring) and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1391
          isinstance(c, list)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1392
      # Method 4: text1, text2, diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1393
      # text2 is not used.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1394
      text1 = a
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1395
      diffs = c
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1396
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1397
      raise ValueError("Unknown call format to patch_make.")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1398
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1399
    if not diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1400
      return []  # Get rid of the None case.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1401
    patches = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1402
    patch = patch_obj()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1403
    char_count1 = 0  # Number of characters into the text1 string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1404
    char_count2 = 0  # Number of characters into the text2 string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1405
    prepatch_text = text1  # Recreate the patches to determine context info.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1406
    postpatch_text = text1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1407
    for x in xrange(len(diffs)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1408
      (diff_type, diff_text) = diffs[x]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1409
      if len(patch.diffs) == 0 and diff_type != self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1410
        # A new patch starts here.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1411
        patch.start1 = char_count1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1412
        patch.start2 = char_count2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1413
      if diff_type == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1414
        # Insertion
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1415
        patch.diffs.append(diffs[x])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1416
        patch.length2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1417
        postpatch_text = (postpatch_text[:char_count2] + diff_text +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1418
                          postpatch_text[char_count2:])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1419
      elif diff_type == self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1420
        # Deletion.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1421
        patch.length1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1422
        patch.diffs.append(diffs[x])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1423
        postpatch_text = (postpatch_text[:char_count2] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1424
                          postpatch_text[char_count2 + len(diff_text):])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1425
      elif (diff_type == self.DIFF_EQUAL and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1426
            len(diff_text) <= 2 * self.Patch_Margin and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1427
            len(patch.diffs) != 0 and len(diffs) != x + 1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1428
        # Small equality inside a patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1429
        patch.diffs.append(diffs[x])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1430
        patch.length1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1431
        patch.length2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1432
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1433
      if (diff_type == self.DIFF_EQUAL and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1434
          len(diff_text) >= 2 * self.Patch_Margin):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1435
        # Time for a new patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1436
        if len(patch.diffs) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1437
          self.patch_addContext(patch, prepatch_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1438
          patches.append(patch)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1439
          patch = patch_obj()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1440
          # Unlike Unidiff, our patch lists have a rolling context.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1441
          # http://code.google.com/p/google-diff-match-patch/wiki/Unidiff
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1442
          # Update prepatch text & pos to reflect the application of the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1443
          # just completed patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1444
          prepatch_text = postpatch_text
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1445
          char_count1 = char_count2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1446
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1447
      # Update the current character count.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1448
      if diff_type != self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1449
        char_count1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1450
      if diff_type != self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1451
        char_count2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1452
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1453
    # Pick up the leftover patch if not empty.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1454
    if len(patch.diffs) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1455
      self.patch_addContext(patch, prepatch_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1456
      patches.append(patch)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1457
    return patches
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1458
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1459
  def patch_deepCopy(self, patches):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1460
    """Given an array of patches, return another array that is identical.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1461
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1462
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1463
      patches: Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1464
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1465
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1466
      Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1467
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1468
    patchesCopy = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1469
    for patch in patches:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1470
      patchCopy = patch_obj()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1471
      # No need to deep copy the tuples since they are immutable.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1472
      patchCopy.diffs = patch.diffs[:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1473
      patchCopy.start1 = patch.start1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1474
      patchCopy.start2 = patch.start2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1475
      patchCopy.length1 = patch.length1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1476
      patchCopy.length2 = patch.length2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1477
      patchesCopy.append(patchCopy)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1478
    return patchesCopy
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1479
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1480
  def patch_apply(self, patches, text):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1481
    """Merge a set of patches onto the text.  Return a patched text, as well
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1482
    as a list of true/false values indicating which patches were applied.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1483
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1484
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1485
      patches: Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1486
      text: Old text.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1487
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1488
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1489
      Two element Array, containing the new text and an array of boolean values.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1490
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1491
    if not patches:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1492
      return (text, [])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1493
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1494
    # Deep copy the patches so that no changes are made to originals.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1495
    patches = self.patch_deepCopy(patches)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1496
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1497
    nullPadding = self.patch_addPadding(patches)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1498
    text = nullPadding + text + nullPadding
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1499
    self.patch_splitMax(patches)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1500
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1501
    # delta keeps track of the offset between the expected and actual location
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1502
    # of the previous patch.  If there are patches expected at positions 10 and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1503
    # 20, but the first patch was found at 12, delta is 2 and the second patch
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1504
    # has an effective expected position of 22.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1505
    delta = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1506
    results = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1507
    for patch in patches:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1508
      expected_loc = patch.start2 + delta
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1509
      text1 = self.diff_text1(patch.diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1510
      end_loc = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1511
      if len(text1) > self.Match_MaxBits:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1512
        # patch_splitMax will only provide an oversized pattern in the case of
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1513
        # a monster delete.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1514
        start_loc = self.match_main(text, text1[:self.Match_MaxBits],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1515
                                    expected_loc)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1516
        if start_loc != -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1517
          end_loc = self.match_main(text, text1[-self.Match_MaxBits:],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1518
              expected_loc + len(text1) - self.Match_MaxBits)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1519
          if end_loc == -1 or start_loc >= end_loc:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1520
            # Can't find valid trailing context.  Drop this patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1521
            start_loc = -1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1522
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1523
        start_loc = self.match_main(text, text1, expected_loc)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1524
      if start_loc == -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1525
        # No match found.  :(
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1526
        results.append(False)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1527
        # Subtract the delta for this failed patch from subsequent patches.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1528
        delta -= patch.length2 - patch.length1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1529
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1530
        # Found a match.  :)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1531
        results.append(True)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1532
        delta = start_loc - expected_loc
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1533
        if end_loc == -1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1534
          text2 = text[start_loc : start_loc + len(text1)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1535
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1536
          text2 = text[start_loc : end_loc + self.Match_MaxBits]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1537
        if text1 == text2:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1538
          # Perfect match, just shove the replacement text in.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1539
          text = (text[:start_loc] + self.diff_text2(patch.diffs) +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1540
                      text[start_loc + len(text1):])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1541
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1542
          # Imperfect match.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1543
          # Run a diff to get a framework of equivalent indices.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1544
          diffs = self.diff_main(text1, text2, False)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1545
          if (len(text1) > self.Match_MaxBits and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1546
              self.diff_levenshtein(diffs) / float(len(text1)) >
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1547
              self.Patch_DeleteThreshold):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1548
            # The end points match, but the content is unacceptably bad.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1549
            results[-1] = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1550
          else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1551
            self.diff_cleanupSemanticLossless(diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1552
            index1 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1553
            for (op, data) in patch.diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1554
              if op != self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1555
                index2 = self.diff_xIndex(diffs, index1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1556
              if op == self.DIFF_INSERT:  # Insertion
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1557
                text = text[:start_loc + index2] + data + text[start_loc +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1558
                                                               index2:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1559
              elif op == self.DIFF_DELETE:  # Deletion
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1560
                text = text[:start_loc + index2] + text[start_loc +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1561
                    self.diff_xIndex(diffs, index1 + len(data)):]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1562
              if op != self.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1563
                index1 += len(data)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1564
    # Strip the padding off.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1565
    text = text[len(nullPadding):-len(nullPadding)]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1566
    return (text, results)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1567
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1568
  def patch_addPadding(self, patches):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1569
    """Add some padding on text start and end so that edges can match
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1570
    something.  Intended to be called only from within patch_apply.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1571
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1572
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1573
      patches: Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1574
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1575
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1576
      The padding string added to each side.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1577
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1578
    paddingLength = self.Patch_Margin
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1579
    nullPadding = ""
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1580
    for x in xrange(1, paddingLength + 1):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1581
      nullPadding += chr(x)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1582
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1583
    # Bump all the patches forward.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1584
    for patch in patches:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1585
      patch.start1 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1586
      patch.start2 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1587
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1588
    # Add some padding on start of first diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1589
    patch = patches[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1590
    diffs = patch.diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1591
    if not diffs or diffs[0][0] != self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1592
      # Add nullPadding equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1593
      diffs.insert(0, (self.DIFF_EQUAL, nullPadding))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1594
      patch.start1 -= paddingLength  # Should be 0.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1595
      patch.start2 -= paddingLength  # Should be 0.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1596
      patch.length1 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1597
      patch.length2 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1598
    elif paddingLength > len(diffs[0][1]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1599
      # Grow first equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1600
      extraLength = paddingLength - len(diffs[0][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1601
      newText = nullPadding[len(diffs[0][1]):] + diffs[0][1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1602
      diffs[0] = (diffs[0][0], newText)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1603
      patch.start1 -= extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1604
      patch.start2 -= extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1605
      patch.length1 += extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1606
      patch.length2 += extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1607
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1608
    # Add some padding on end of last diff.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1609
    patch = patches[-1]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1610
    diffs = patch.diffs
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1611
    if not diffs or diffs[-1][0] != self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1612
      # Add nullPadding equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1613
      diffs.append((self.DIFF_EQUAL, nullPadding))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1614
      patch.length1 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1615
      patch.length2 += paddingLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1616
    elif paddingLength > len(diffs[-1][1]):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1617
      # Grow last equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1618
      extraLength = paddingLength - len(diffs[-1][1])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1619
      newText = diffs[-1][1] + nullPadding[:extraLength]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1620
      diffs[-1] = (diffs[-1][0], newText)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1621
      patch.length1 += extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1622
      patch.length2 += extraLength
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1623
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1624
    return nullPadding
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1625
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1626
  def patch_splitMax(self, patches):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1627
    """Look through the patches and break up any which are longer than the
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1628
    maximum limit of the match algorithm.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1629
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1630
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1631
      patches: Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1632
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1633
    if self.Match_MaxBits == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1634
      return
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1635
    for x in xrange(len(patches)):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1636
      if patches[x].length1 > self.Match_MaxBits:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1637
        bigpatch = patches[x]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1638
        # Remove the big old patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1639
        del patches[x]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1640
        x -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1641
        patch_size = self.Match_MaxBits
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1642
        start1 = bigpatch.start1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1643
        start2 = bigpatch.start2
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1644
        precontext = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1645
        while len(bigpatch.diffs) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1646
          # Create one of several smaller patches.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1647
          patch = patch_obj()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1648
          empty = True
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1649
          patch.start1 = start1 - len(precontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1650
          patch.start2 = start2 - len(precontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1651
          if precontext:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1652
            patch.length1 = patch.length2 = len(precontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1653
            patch.diffs.append((self.DIFF_EQUAL, precontext))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1654
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1655
          while (len(bigpatch.diffs) != 0 and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1656
                 patch.length1 < patch_size - self.Patch_Margin):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1657
            (diff_type, diff_text) = bigpatch.diffs[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1658
            if diff_type == self.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1659
              # Insertions are harmless.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1660
              patch.length2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1661
              start2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1662
              patch.diffs.append(bigpatch.diffs.pop(0))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1663
              empty = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1664
            elif (diff_type == self.DIFF_DELETE and len(patch.diffs) == 1 and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1665
                patch.diffs[0][0] == self.DIFF_EQUAL and
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1666
                len(diff_text) > 2 * patch_size):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1667
              # This is a large deletion.  Let it pass in one chunk.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1668
              patch.length1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1669
              start1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1670
              empty = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1671
              patch.diffs.append((diff_type, diff_text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1672
              del bigpatch.diffs[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1673
            else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1674
              # Deletion or equality.  Only take as much as we can stomach.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1675
              diff_text = diff_text[:patch_size - patch.length1 -
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1676
                                    self.Patch_Margin]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1677
              patch.length1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1678
              start1 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1679
              if diff_type == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1680
                patch.length2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1681
                start2 += len(diff_text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1682
              else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1683
                empty = False
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1684
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1685
              patch.diffs.append((diff_type, diff_text))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1686
              if diff_text == bigpatch.diffs[0][1]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1687
                del bigpatch.diffs[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1688
              else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1689
                bigpatch.diffs[0] = (bigpatch.diffs[0][0],
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1690
                                     bigpatch.diffs[0][1][len(diff_text):])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1691
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1692
          # Compute the head context for the next patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1693
          precontext = self.diff_text2(patch.diffs)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1694
          precontext = precontext[-self.Patch_Margin:]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1695
          # Append the end context for this patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1696
          postcontext = self.diff_text1(bigpatch.diffs)[:self.Patch_Margin]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1697
          if postcontext:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1698
            patch.length1 += len(postcontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1699
            patch.length2 += len(postcontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1700
            if len(patch.diffs) != 0 and patch.diffs[-1][0] == self.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1701
              patch.diffs[-1] = (self.DIFF_EQUAL, patch.diffs[-1][1] +
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1702
                                 postcontext)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1703
            else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1704
              patch.diffs.append((self.DIFF_EQUAL, postcontext))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1705
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1706
          if not empty:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1707
            x += 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1708
            patches.insert(x, patch)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1709
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1710
  def patch_toText(self, patches):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1711
    """Take a list of patches and return a textual representation.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1712
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1713
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1714
      patches: Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1715
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1716
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1717
      Text representation of patches.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1718
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1719
    text = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1720
    for patch in patches:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1721
      text.append(str(patch))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1722
    return "".join(text)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1723
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1724
  def patch_fromText(self, textline):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1725
    """Parse a textual representation of patches and return a list of patch
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1726
    objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1727
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1728
    Args:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1729
      textline: Text representation of patches.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1730
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1731
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1732
      Array of patch objects.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1733
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1734
    Raises:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1735
      ValueError: If invalid input.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1736
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1737
    if type(textline) == unicode:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1738
      # Patches should be composed of a subset of ascii chars, Unicode not
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1739
      # required.  If this encode raises UnicodeEncodeError, patch is invalid.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1740
      textline = textline.encode("ascii")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1741
    patches = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1742
    if not textline:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1743
      return patches
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1744
    text = textline.split('\n')
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1745
    while len(text) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1746
      m = re.match("^@@ -(\d+),?(\d*) \+(\d+),?(\d*) @@$", text[0])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1747
      if not m:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1748
        raise ValueError, "Invalid patch string: " + text[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1749
      patch = patch_obj()
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1750
      patches.append(patch)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1751
      patch.start1 = int(m.group(1))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1752
      if m.group(2) == '':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1753
        patch.start1 -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1754
        patch.length1 = 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1755
      elif m.group(2) == '0':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1756
        patch.length1 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1757
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1758
        patch.start1 -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1759
        patch.length1 = int(m.group(2))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1760
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1761
      patch.start2 = int(m.group(3))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1762
      if m.group(4) == '':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1763
        patch.start2 -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1764
        patch.length2 = 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1765
      elif m.group(4) == '0':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1766
        patch.length2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1767
      else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1768
        patch.start2 -= 1
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1769
        patch.length2 = int(m.group(4))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1770
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1771
      del text[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1772
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1773
      while len(text) != 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1774
        if text[0]:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1775
          sign = text[0][0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1776
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1777
          sign = ''
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1778
        line = urllib.unquote(text[0][1:])
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1779
        line = line.decode("utf-8")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1780
        if sign == '+':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1781
          # Insertion.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1782
          patch.diffs.append((self.DIFF_INSERT, line))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1783
        elif sign == '-':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1784
          # Deletion.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1785
          patch.diffs.append((self.DIFF_DELETE, line))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1786
        elif sign == ' ':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1787
          # Minor equality.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1788
          patch.diffs.append((self.DIFF_EQUAL, line))
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1789
        elif sign == '@':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1790
          # Start of next patch.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1791
          break
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1792
        elif sign == '':
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1793
          # Blank line?  Whatever.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1794
          pass
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1795
        else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1796
          # WTF?
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1797
          raise ValueError, "Invalid patch mode: '%s'\n%s" % (sign, line)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1798
        del text[0]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1799
    return patches
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1800
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1801
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1802
class patch_obj:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1803
  """Class representing one patch operation.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1804
  """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1805
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1806
  def __init__(self):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1807
    """Initializes with an empty list of diffs.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1808
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1809
    self.diffs = []
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1810
    self.start1 = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1811
    self.start2 = None
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1812
    self.length1 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1813
    self.length2 = 0
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1814
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1815
  def __str__(self):
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1816
    """Emmulate GNU diff's format.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1817
    Header: @@ -382,8 +481,9 @@
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1818
    Indicies are printed as 1-based, not 0-based.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1819
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1820
    Returns:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1821
      The GNU diff string.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1822
    """
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1823
    if self.length1 == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1824
      coords1 = str(self.start1) + ",0"
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1825
    elif self.length1 == 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1826
      coords1 = str(self.start1 + 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1827
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1828
      coords1 = str(self.start1 + 1) + "," + str(self.length1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1829
    if self.length2 == 0:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1830
      coords2 = str(self.start2) + ",0"
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1831
    elif self.length2 == 1:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1832
      coords2 = str(self.start2 + 1)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1833
    else:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1834
      coords2 = str(self.start2 + 1) + "," + str(self.length2)
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1835
    text = ["@@ -", coords1, " +", coords2, " @@\n"]
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1836
    # Escape the body of the patch with %xx notation.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1837
    for (op, data) in self.diffs:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1838
      if op == diff_match_patch.DIFF_INSERT:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1839
        text.append("+")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1840
      elif op == diff_match_patch.DIFF_DELETE:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1841
        text.append("-")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1842
      elif op == diff_match_patch.DIFF_EQUAL:
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1843
        text.append(" ")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1844
      # High ascii will raise UnicodeDecodeError.  Use Unicode instead.
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1845
      data = data.encode("utf-8")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1846
      text.append(urllib.quote(data, "!~*'();/?:@&=+$,# ") + "\n")
cae2de810f77 improve diff between versions (word diff)
raph
parents:
diff changeset
  1847
    return "".join(text)