Skip to content

Commit 8ed3497

Browse files
committed
Tweaks.
1 parent 196704d commit 8ed3497

File tree

1 file changed

+21
-68
lines changed

1 file changed

+21
-68
lines changed

README.md

Lines changed: 21 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -111,12 +111,12 @@ But the overloaded version `compressor.compress(String)` already calls it automa
111111

112112
### Where to store the compressed data?
113113
In its purest form, a `String` is just a byte array (`byte[]`), and a compressed `String` couldn't be different.
114-
You can store it anywhere you would store a `byte[]`.
115-
The most common approach is to store each compressed string ordered in memory using a `byte[][]` (for binary search) or
116-
a B+Tree if you need frequent insertions (coming in the next release).
117-
The frequency of reads and writes + business requirements will tell the best media and data structure to use.
114+
You can store it anywhere you would store a `byte[]`. If you are compressing millions of different entries, a very common
115+
approach is to store each compressed string ordered in memory using a `byte[][]` (for binary search) or a B+Tree if you
116+
need frequent insertions (coming in the next release). The frequency of reads and writes + business requirements will
117+
tell the best media and data structure to use.
118118

119-
If the data is ordered before compression and stored in-memory in a `byte[][]`, you can use the full power of the binary
119+
If the data is ordered before compression and stored in-memory in a `byte[][]` as mentioned above, you can use the full power of the binary
120120
search directly in the compressed data through `FourBitBinarySearch`, `FiveBitBinarySearch`, and `SixBitBinarySearch`.
121121

122122
### Binary search
@@ -146,18 +146,20 @@ int index = binary.search("63821623849863628763#");
146146

147147
if (index >= 0) {
148148
byte[] found = compressedData[index];
149-
String decompressed = compressor.decompress(found);
150-
}
149+
String decompressed = compressor.decompress(found);
150+
```
151+
In case you used are using a custom character set to compress the data, you need to pass it through the binary search constructor:
152+
```java
153+
public FiveBitBinarySearch(byte[][] compressedData, boolean prefixSearch, byte[] charset)
151154
```
152-
153-
In case you used a custom character set to compress
154155

155156
### B+Tree
156157

157158
Coming in the next release.
158159

159160
### Bulk / Batch compression
160161

162+
In some rare cases you need to fetch your data in batches from a remote location or another third party actor.
161163
java-string-compressor provides both, `BulkCompressor` and `ManagedBulkCompressor` specifically for this task.
162164
They help you automatize the process of adding each batch to the correct position in the destination array where the
163165
compressed data will be stored. Both currently supports `byte[][]` as destination for the compressed data.
@@ -168,67 +170,18 @@ from handle array positions and bounds. This is why we recommend `ManagedBulkCom
168170

169171
Both bulk compressors loop through the data in parallel by calling `IntStream.range().parallel()`.
170172

171-
Let's take `compactedData` from the previous example and show how we can populate it with data from all customers:
172-
173173
```java
174-
byte[][] compactedData = new byte[100000000][]; // Data for 100 million customers.
175-
176-
177-
178-
179-
180-
181-
182-
byte[] compressed = compressor.compress(input);
183-
byte[] decompressed = compressor.decompress(compressed);
184-
String string = new String(decompressed, StandardCharsets.ISO_8859_1);
174+
byte[][] compressedData = new byte[100000000][]; // Storage for a max of 100 million customers.
175+
// ...
176+
ManagedBulkCompressor managed = new ManagedBulkCompressor(compressor, compressedData);
177+
// ...loop...
178+
managed.compressAndAddAll(batch); // batch is the list of strings to be compressed.
185179
```
186180

187-
188-
`BulkCompressor` is a "lower-level" utility where
189-
190-
191-
181+
### Logging
182+
If you need logging, search for libraries like ZeroLog, ChronicleLog, Log4j 2 Async Loggers, and other similar tools
183+
(we did not test any of those). You will need a fast log library, or it can become a bottleneck.
192184

193185
### Other
194-
Do not forget to check our JavaDocs with further information about each member.
195-
196-
197-
198-
199-
200-
201-
202-
203-
204-
205-
206-
207-
<br>
208-
<br>
209-
<br>
210-
<br>
211-
<br>
212-
<br>
213-
<br>
214-
<br>
215-
<br>
216-
<br>
217-
<br>
218-
<br>
219-
<br>
220-
<br>
221-
<br>
222-
<br>
223-
224-
225-
226-
227-
228-
229-
230-
231-
232-
233-
234-
if you need logging , check ZeroLog, ChronicleLog and similar tools
186+
Do not forget to check the JavaDocs with further information about each member.
187+
Also check the test directory for additional examples.

0 commit comments

Comments
 (0)