Skip to content

Commit 9217c51

Browse files
committed
Reimplemented ReinsertionDistance to use O(n lg n) alg for longest common subsequence
1 parent c3e413c commit 9217c51

File tree

4 files changed

+76
-8
lines changed

4 files changed

+76
-8
lines changed

docs/api/org/cicirello/permutations/distance/ReinsertionDistance.html

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
<!-- NewPage -->
33
<html lang="en">
44
<head>
5-
<!-- Generated by javadoc (1.8.0_05) on Wed Aug 08 13:08:41 EDT 2018 -->
5+
<!-- Generated by javadoc (1.8.0_05) on Fri Aug 17 10:57:33 EDT 2018 -->
66
<title>ReinsertionDistance (JavaPermutationTools - A Java API for computation on permutations)</title>
7-
<meta name="date" content="2018-08-08">
7+
<meta name="date" content="2018-08-17">
88
<link rel="stylesheet" type="text/css" href="../../../../stylesheet.css" title="Style">
99
<script type="text/javascript" src="../../../../script.js"></script>
1010
</head>
@@ -127,19 +127,31 @@ <h2 title="Class ReinsertionDistance" class="title">Class ReinsertionDistance</h
127127
<p>This implementation utilizes the observation that the elements that must be removed and reinserted
128128
are exactly those elements that are not in the longest common subsequence.</p>
129129

130-
<p>Runtime: O(n^2), where n is the permutation length.</p>
130+
<p>Runtime: O(n lg n), where n is the permutation length.</p>
131131

132132
<p>Reinsertion distance more generally was described in:<br>
133133
V. A. Cicirello and R. Cernera, <a href="https://www.cicirello.org/publications/cicirello2013flairs.html" target=_top>"Profiling the distance characteristics
134134
of mutation operators for permutation-based genetic algorithms,"</a>
135135
in Proceedings of the 26th FLAIRS Conference. AAAI Press, May 2013, pp. 46–51.</p>
136136

137-
<p>However, in that paper, it was computed using an adaptation of string Edit Distance.</p>
137+
<p>However, in that paper, it was computed, in O(n^2) time, using an adaptation of string Edit Distance.</p>
138138

139139
<p>For description of computing it using the length of the longest common subsequence, see:<br>
140140
V.A. Cicirello, <a href="https://www.cicirello.org/publications/cicirello2016evc.html" target=_top>"The Permutation in a Haystack Problem
141141
and the Calculus of Search Landscapes,"</a>
142-
IEEE Transactions on Evolutionary Computation, 20(3):434-446, June 2016.</p></div>
142+
IEEE Transactions on Evolutionary Computation, 20(3):434-446, June 2016.</p>
143+
144+
<p>However, that paper used an O(n^2) time algorithm for longest common subsequence. This class has been
145+
updated to use a more efficient O(n lg n) algorithm for longest common subsequence. It is a version of
146+
Hunt et al's algorithm
147+
that has been optimized to assume permutations of the integers in [0, (n-1)] with unique elements.
148+
The original algorithm of Hunt et al was for general strings that could contain duplicates and which could consist
149+
of characters of any alphabet. In that more general case, O(n lg n) was the best case runtime. In our
150+
special case, O(n lg n) is worst case runtime.</p>
151+
152+
<p>See the following for complete details of Hunt et al's algorithm for longest common subsequence:<br>
153+
J.W. Hunt and T.G. Szymanski, "A fast algorithm for computing longest common subsequences,"
154+
Communications of the ACM, 20(5):350-353, May, 1977.</p></div>
143155
<dl>
144156
<dt><span class="simpleTagLabel">Since:</span></dt>
145157
<dd>1.0</dd>

lib/jpt1.jar

329 Bytes
Binary file not shown.

src/org/cicirello/permutations/distance/ReinsertionDistance.java

Lines changed: 51 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,22 +30,34 @@
3030
* <p>This implementation utilizes the observation that the elements that must be removed and reinserted
3131
* are exactly those elements that are not in the longest common subsequence.</p>
3232
*
33-
* <p>Runtime: O(n^2), where n is the permutation length.</p>
33+
* <p>Runtime: O(n lg n), where n is the permutation length.</p>
3434
*
3535
* <p>Reinsertion distance more generally was described in:<br>
3636
* V. A. Cicirello and R. Cernera, <a href="https://www.cicirello.org/publications/cicirello2013flairs.html" target=_top>"Profiling the distance characteristics
3737
* of mutation operators for permutation-based genetic algorithms,"</a>
3838
* in Proceedings of the 26th FLAIRS Conference. AAAI Press, May 2013, pp. 46–51.</p>
3939
*
40-
* <p>However, in that paper, it was computed using an adaptation of string Edit Distance.</p>
40+
* <p>However, in that paper, it was computed, in O(n^2) time, using an adaptation of string Edit Distance.</p>
4141
*
4242
* <p>For description of computing it using the length of the longest common subsequence, see:<br>
4343
* V.A. Cicirello, <a href="https://www.cicirello.org/publications/cicirello2016evc.html" target=_top>"The Permutation in a Haystack Problem
4444
* and the Calculus of Search Landscapes,"</a>
4545
* IEEE Transactions on Evolutionary Computation, 20(3):434-446, June 2016.</p>
4646
*
47+
* <p>However, that paper used an O(n^2) time algorithm for longest common subsequence. This class has been
48+
* updated to use a more efficient O(n lg n) algorithm for longest common subsequence. It is a version of
49+
* Hunt et al's algorithm
50+
* that has been optimized to assume permutations of the integers in [0, (n-1)] with unique elements.
51+
* The original algorithm of Hunt et al was for general strings that could contain duplicates and which could consist
52+
* of characters of any alphabet. In that more general case, O(n lg n) was the best case runtime. In our
53+
* special case, O(n lg n) is worst case runtime.</p>
54+
*
55+
* <p>See the following for complete details of Hunt et al's algorithm for longest common subsequence:<br>
56+
* J.W. Hunt and T.G. Szymanski, "A fast algorithm for computing longest common subsequences,"
57+
* Communications of the ACM, 20(5):350-353, May, 1977.</p>
58+
*
4759
* @author <a href=https://www.cicirello.org/ target=_top>Vincent A. Cicirello</a>, <a href=https://www.cicirello.org/ target=_top>https://www.cicirello.org/</a>
48-
* @version 2.18.8.2
60+
* @version 2.18.8.17
4961
* @since 1.0
5062
*
5163
*/
@@ -61,7 +73,43 @@ public int distance(Permutation p1, Permutation p2) {
6173
return p1.length() - lcs(p1,p2);
6274
}
6375

76+
// This version runs in O(n lg n)
6477
private int lcs(Permutation p1, Permutation p2) {
78+
int n = p1.length();
79+
int[] inv = p2.getInverse();
80+
int[] match = new int[n];
81+
int[] thresh = new int[n+1];
82+
thresh[0] = -1;
83+
for (int i = 0; i < n; i++) {
84+
match[i] = inv[p1.get(i)];
85+
thresh[i+1] = n;
86+
}
87+
int maxK = 0;
88+
for (int i = 0; i < n; i++) {
89+
int j = match[i];
90+
int k = binSearch(thresh, j, 0, maxK+1);
91+
if (j < thresh[k]) {
92+
thresh[k] = j;
93+
if (k > maxK) maxK = k;
94+
}
95+
}
96+
return maxK;
97+
}
98+
99+
private int binSearch(int[] array, int value, int low, int high) {
100+
if (high == low) return low;
101+
int mid = (high+low) / 2;
102+
if (value <= array[mid] && value > array[mid-1]) {
103+
return mid;
104+
} else if (value > array[mid]) {
105+
return binSearch(array, value, mid+1, high);
106+
} else {
107+
return binSearch(array, value, low, mid-1);
108+
}
109+
}
110+
111+
// OLD O(n^2) Version: Keep temporarily
112+
private int lcsOLD(Permutation p1, Permutation p2) {
65113
int L1 = p1.length();
66114
int L2 = p2.length();
67115
int start = L1;

tests/org/cicirello/permutations/distance/PermutationDistanceTests.java

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,14 @@ public void testReinsertionDistance() {
268268
assertEquals("all but 2 on longest common subsequence", 2, d.distance(p1,p2));
269269
assertEquals("all but 3 on longest common subsequence", 3, d.distance(p1,p3));
270270
assertEquals("all but 3 on longest common subsequence", 3, d.distance(p1,p4));
271+
272+
Permutation p = new Permutation(6);
273+
EditDistance edit = new EditDistance();
274+
for (Permutation q : p) {
275+
// NOTE: If this assertion fails, problem is either in ReinsertionDistance or EditDistance
276+
// Should correspond if they are both correct.
277+
assertEquals("equiv of edit with 0.5 cost removes and inserts", edit.distancef(p,q), d.distancef(p,q), EPSILON);
278+
}
271279
}
272280

273281
@Test

0 commit comments

Comments
 (0)