Skip to content

Commit 4073590

Browse files
committed
feat(tutorial): z-algorithm
1 parent 072cc9d commit 4073590

File tree

1 file changed

+128
-83
lines changed

1 file changed

+128
-83
lines changed

tutorials/strings/z-algorithm.md

Lines changed: 128 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,126 @@
11
---
22
title: "Z Algorithm"
3-
description: 'TBC'
4-
draft: true
3+
description: "Z Algorithm is a string matching algorithm with O(n) complexity"
54
keywords:
65
- leetcode
76
- tutorial
87
- z algorithm
98
- algorithm
109
---
11-
<TutorialAuthors names="@Cyber-Machine"/>
1210

13-
## Overview
14-
Z-algorithm is used in pattern matching in string.
11+
<TutorialAuthors names="@wingkwong, @Cyber-Machine"/>
1512

16-
Let's take a look at Leetcode problem [796. Rotate String](https://leetcode.com/problems/rotate-string/).
13+
## Overview
1714

15+
In $O(n)$ time complexity, the Z-array makes it possible for a quick string pattern search by eliminating all but the prefix substring lengths that repeat inside the string, with the help of the Z-algorithm that preprocesses the string. This is useful when it requires fast and reliable string searching.
1816

19-
>Given two strings s and goal, return true if and only if s can become goal after some number of shifts on s.
17+
### How it works
18+
19+
1. **Initialization**:
20+
21+
- Let $S$ be the string for which we want to compute the Z-values.
22+
- Let $n$ be the length of $S$.
23+
- Initialize an array $Z$ of the same length $n$ where each value is 0.
24+
- Set the boundaries of the initial comparison window, $l = 0$ and $r = 0$.
25+
26+
2. **Iterate over $S$ from the second character to the end**:
27+
28+
- For each index $i$, determine the Z-value as follows:
29+
- If $i > r$, you do not have any previous information about the substrings that start from $i$, so compare $S[i]$ onwards with the beginning of $S$ until you encounter a mismatch. Update $Z[i]$ and adjust $l$ and $r$ accordingly.
30+
- If $i \leq r$, then $i$ is within the bounds of a previously computed substring. Use previously computed Z-values to potentially skip some comparisons:
31+
- Compute $k = i - l$.
32+
- If $Z[k] < r - i + 1$, then $Z[i] = Z[k]$
33+
34+
3. **Using the Z-array**:
35+
- The Z-array can be used directly to solve various string matching problems, such as finding a pattern within a text.
36+
37+
### Dry Run
38+
39+
We'll take the string $S = "abacabad"$ to demonstrate how Z-values are computed step-by-step. Our goal is to compute the Z-array for $S$. The Z-array, $Z$, will store the length of the longest substring starting from $S[i]$ that matches a prefix of $S$.
40+
41+
#### Iteration #1
42+
43+
- $ i = 1 $ is greater than $ r $, so we need to compare from $S[1]$ onwards.
44+
- We compare $S[1]$ with $S[0]$ and find no match (because 'b' ≠ 'a').
45+
- Thus, $ Z[1] = 0 $, and $ l $ and $ r $ remain unchanged.
46+
47+
#### Iteration #2
48+
49+
- $ i = 2 $ is greater than $ r $.
50+
- We start comparing from $S[2]$ onwards. $S[2]$ matches $S[0]$, $S[3]$ matches $S[1]$, but $S[4]$ does not match $S[2]$.
51+
- So, $ Z[2] = 2 $, $ l = 2 $, and $ r = 3 $.
52+
53+
#### Iteration #3
54+
55+
- $ i = 3 $ is equal to $ r $. Here, $ k = i - l = 3 - 2 = 1 $.
56+
- Since $ Z[1] = 0 $, $ Z[3] = 0 $. No need to adjust $ l $ and $ r $.
57+
58+
#### Iteration #4
59+
60+
- $ i = 4 $ is greater than $ r $.
61+
- Compare $S[4]$ onwards. $S[4]$ matches $S[0]$, but $S[5]$ does not match $S[1]$.
62+
- So, $ Z[4] = 1 $, $ l = 4 $, and $ r = 4 $.
63+
64+
#### Iteration #5
65+
66+
- $ i = 5 $ is greater than $ r $.
67+
- Compare $S[5]$ onwards. $S[5]$ matches $S[0]$, $S[6]$ matches $S[1]$, $S[7]$ matches $S[2]$, but the string ends there.
68+
- So, $ Z[5] = 3 $, $ l = 5 $, and $ r = 7 $.
69+
70+
#### Iteration #6
71+
72+
- $ i = 6 $ is within the bounds of $ l $ and $ r $. Here, $ k = i - l = 6 - 5 = 1 $.
73+
- $ Z[6] = Z[1] = 0 $ because $ Z[1] < r - i + 1 $.
74+
75+
#### Iteration #7
76+
77+
- $ i = 7 $ is also within the bounds. Here, $ k = i - l = 7 - 5 = 2 $.
78+
- $ Z[7] = Z[2] = 2 $ because $ Z[2] < r - i + 1 $.
79+
80+
The final Z-array is $[0, 0, 2, 0, 1, 3, 0, 2]$.
81+
82+
- **Z[2] = 2**: "ab" at index 2 matches the prefix "ab".
83+
- **Z[4] = 1**: "a" at index 4 matches the prefix "a".
84+
- **Z[5] = 3**: "aba" at index 5 matches the prefix "aba".
85+
- **Z[7] = 2**: "ad" at index 7 does not match any prefix since it's at the string's end, but shows the character match count.
86+
87+
### Complexity Analysis
88+
89+
The Z-algorithm has the following complexities:
90+
91+
Time Complexity: $O(n)$, where $n$ is the length of the string. This linear time complexity is achieved as the algorithm scans the string from left to right, calculating the Z-values in a single pass.
92+
93+
Space Complexity: $O(n)$, due to the storage of the Z-array which contains an entry for each position in the string, matching the string's length.
94+
95+
## Example #1: [796. Rotate String](https://leetcode.com/problems/rotate-string/)
96+
97+
> Given two strings s and goal, return true if and only if s can become goal after some number of shifts on s.
2098
>
21-
>A shift on s consists of moving the leftmost character of s to the rightmost position.
99+
> A shift on s consists of moving the leftmost character of s to the rightmost position.
22100
>
23-
>For example, if s = "abcde", then it will be "bcdea" after one shift.
101+
> For example, if s = "abcde", then it will be "bcdea" after one shift.
24102
25-
Since the string has been cyclically shifted we have to basically check if goal appears when we concat our initial string two times. One way to check the occurance of this is to run two loops and check the occurance of goal in the
103+
Since the string has been cyclically shifted we have to basically check if goal appears when we concat our initial string two times. One way to check the occurance of this is to run two loops and check the occurance of goal in the
26104
concatenation of inital string. Using z-algorithm we can minimize our time complexity to solve this problem in linear time.
27105

28-
In z-algorithm we create an initial array known as *z-array* which stores the length of longest prefix substring that occurs in our string.
106+
In z-algorithm we create an initial array known as _z-array_ which stores the length of longest prefix substring that occurs in our string.
29107

30-
The Z array, at each index, stores the length of the longest substring starting the string till the index, matching the prefix(starting characters) of the string.
108+
The Z array, at each index, stores the length of the longest substring starting the string till the index, matching the prefix (starting characters) of the string.
31109

32110
The Z algorithm uses previous values from certain intervals to match with prefix string, to speed up its execution, and these values are used based on the current window. We check whether it is possible to have a string greater than the maximum length inside the current window. If it is not possible, we skip the calculation for the remaining values inside the window.
33111

34-
By doing this we basically have to check the occurance of goal in our concatenation of given string. We also add a character in the beginning of string which are not present in the string (usually `#` or `$`) to distinguish between the string to be searched and the given string.
35-
36-
For Example :
112+
By doing this we basically have to check the occurance of goal in our concatenation of given string. We also add a character in the beginning of string which are not present in the string (usually `#` or `$`) to distinguish between the string to be searched and the given string. For Example,
37113

38-
$$String$$ = $$abcde$$
114+
$$String$$ = $$abcde$$
39115

116+
$$Pattern = bcdea$$
40117

41-
$$Pattern = bcdea$$
118+
$$New String$$ = $$abcde\$bcdeabcdea$$
42119

120+
$$\text{Z-array} = 0000000000400001$$
43121

44-
$$New String$$ = abcde$bcdeabcdea
122+
As we know what the goal's length is, and it's in our z-array, therefore we've found the pattern.
45123

46-
47-
$$Z-array$$ = $$0000000000400001$$
48-
49-
As we know what the goal's length is, and it's in our z-array, therefore we've found the pattern.
50-
### Implementation
51124
<Tabs>
52125
<TabItem value="py" label="Python">
53126
<SolutionAuthor name="@Cyber-Machine"/>
@@ -56,46 +129,39 @@ As we know what the goal's length is, and it's in our z-array, therefore we've f
56129
class Solution:
57130
def rotateString(self, s, goal):
58131
l = len(s)
59-
#Concatenation of our initial string with itself
60-
s = s+s
61-
#Making a new pattern
62-
pattern = goal+'$'+s
132+
# Concatenation of our initial string with itself
133+
s = s + s
134+
# Making a new pattern
135+
pattern = goal+'$'+s
63136
def z_function(s):
64137
n = len(s)
65138
# Initializing Z - array
66139
z = [0] * n
67140
# We need to create an interval to store prefix substing
68141
l, r = 0, 0
69-
70142
for i in range(1, n):
71-
# If we have some substring that matches in this interval then we know that
72-
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
73-
if i <= r:
74-
z[i] = min(r - i + 1, z[i - l])
75-
143+
# If we have some substring that matches in this interval then we know that
144+
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
145+
if i <= r: z[i] = min(r - i + 1, z[i - l])
76146
# Checking the prefix length of current character
77-
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
78-
z[i] += 1
79-
# It is possible to match S[i..] to S[0..] for more than R - i + 1
147+
while i + z[i] < n and s[z[i]] == s[i + z[i]]: z[i] += 1
148+
# It is possible to match S[i..] to S[0..] for more than R - i + 1
80149
# characters in this case we calculate new interval.
81-
if i + z[i] - 1 > r:
82-
l, r = i, i + z[i] - 1
150+
if i + z[i] - 1 > r: l, r = i, i + z[i] - 1
83151
return z
84-
#Checking if our string exist in our pattern
152+
# Checking if our string exist in our pattern
85153
return len(goal) in z_function(pattern) and len(goal) == l
86154
```
87155

88156
</TabItem>
89157
</Tabs>
90158

91-
92-
Let's take another example [459. Repeated Substring Pattern](https://leetcode.com/problems/repeated-substring-pattern/)
159+
## Example #2: [459. Repeated Substring Pattern](https://leetcode.com/problems/repeated-substring-pattern/)
93160

94161
> Given a string s, check if it can be constructed by taking a substring of it and appending multiple copies of the substring together.
95162
96163
As in the previous question, here we must put our special characters in a different position and determine whether we can construct our string by repeated additions of the prefix string.
97164

98-
### Implementation
99165
<Tabs>
100166
<TabItem value="py" label="Python">
101167
<SolutionAuthor name="@Cyber-Machine"/>
@@ -109,53 +175,42 @@ class Solution(object):
109175
z = [0] * n
110176
# We need to create an interval to store prefix substing
111177
l, r = 0, 0
112-
113178
for i in range(1, n):
114-
# If we have some substring that matches in this interval then we know that
115-
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
116-
if i <= r:
117-
z[i] = min(r - i + 1, z[i - l])
118-
179+
# If we have some substring that matches in this interval then we know that
180+
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
181+
if i <= r: z[i] = min(r - i + 1, z[i - l])
119182
# Checking the prefix length of current character
120-
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
121-
z[i] += 1
122-
# It is possible to match S[i..] to S[0..] for more than R - i + 1
183+
while i + z[i] < n and s[z[i]] == s[i + z[i]]: z[i] += 1
184+
# It is possible to match S[i..] to S[0..] for more than R - i + 1
123185
# characters in this case we calculate new interval.
124-
if i + z[i] - 1 > r:
125-
l, r = i, i + z[i] - 1
186+
if i + z[i] - 1 > r: l, r = i, i + z[i] - 1
126187
return z
127-
for i in range(len(s)//2):
128-
if(len(s) % (i+1) == 0):
129-
z = z_function(s[:i+1]+"$"+s[i+1:])
130-
188+
for i in range(len(s) // 2):
189+
if(len(s) % (i + 1) == 0):
190+
z = z_function(s[:i + 1]+"$" + s[i + 1:])
131191
count = 0
132192
for j in z:
133-
if(j == (i+1)):
193+
if (j == (i + 1)):
134194
count += 1
135-
if(count == len(s) // (i+1) - 1):
195+
if(count == len(s) // (i + 1) - 1):
136196
return True
137-
138197
return False
139-
140-
141198
```
142199

143200
</TabItem>
144201
</Tabs>
145202

203+
## Example #3: [2223. Sum of Scores of Built Strings](https://leetcode.com/problems/sum-of-scores-of-built-strings/)
146204

147-
Let's look into another problem [2223. Sum of Scores of Built Strings](https://leetcode.com/problems/sum-of-scores-of-built-strings/)
148-
149-
>You are building a string s of length n one character at a time, prepending each new character to the front of the string. The strings are labeled from 1 to n, where the string with length i is labeled si.
205+
> You are building a string s of length n one character at a time, prepending each new character to the front of the string. The strings are labeled from 1 to n, where the string with length i is labeled si.
150206
>
151-
>For example, for s = "abaca", s1 == "a", s2 == "ca", s3 == "aca", etc.
152-
>The score of si is the length of the longest common prefix between si and sn (Note that s == sn).
207+
> For example, for s = "abaca", s1 == "a", s2 == "ca", s3 == "aca", etc.
208+
> The score of si is the length of the longest common prefix between si and sn (Note that s == sn).
153209
>
154-
>Given the final string s, return the sum of the score of every si.
210+
> Given the final string s, return the sum of the score of every si.
155211
156212
Here we have to find the longest prefix sum of all substring present in our string, which is basically sum of all values present in our z-array and add overall length of string in our answer as prefix of whole string is our original string.
157213

158-
### Implementation
159214
<Tabs>
160215
<TabItem value="py" label="Python">
161216
<SolutionAuthor name="@Cyber-Machine"/>
@@ -169,31 +224,21 @@ class Solution:
169224
z = [0] * n
170225
# We need to create an interval to store prefix substing
171226
l, r = 0, 0
172-
173227
for i in range(1, n):
174-
# If we have some substring that matches in this interval then we know that
175-
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
228+
# If we have some substring that matches in this interval then we know that
229+
# we have matched S[i.. ] matches S[k..] for atleast r - i + 1 characters
176230
if i <= r:
177231
z[i] = min(r - i + 1, z[i - l])
178-
179232
# Checking the prefix length of current character
180233
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
181234
z[i] += 1
182-
# It is possible to match S[i..] to S[0..] for more than R - i + 1
235+
# It is possible to match S[i..] to S[0..] for more than R - i + 1
183236
# characters in this case we calculate new interval.
184237
if i + z[i] - 1 > r:
185-
l, r = i, i + z[i] - 1
238+
l, r = i, i + z[i] - 1
186239
return z
187-
188240
return sum(z_function(s)) + len(s)
189-
190-
191241
```
192242

193243
</TabItem>
194244
</Tabs>
195-
196-
#### Time Complexity
197-
Since we are searching building our z-array of $$m$$ sized pattern in one iteration by comparing and updating it with $$n$$ sized prefix it takes $$O(m+n)$$ time.
198-
199-
For a detailed explanation [Click here](https://www.scaler.com/topics/data-structures/z-algorithm/)

0 commit comments

Comments
 (0)