@@ -16,7 +16,7 @@ h3. Summary
1616
1717I previously mentioned that the `st_table` is a hash table. What is a hash
1818table? It is a data structure that records one-to-one relations, for
19- example, variable name and its value, function name and its body, etc.
19+ example, a variable name and its value, a function name and its body, etc.
2020
2121However, data structures other than hash tables can, of course, record
2222one-to-one relations. For example, a list of following data structure will
@@ -42,6 +42,7 @@ Now then, let us examine the `st_table`. But first, this library is not
4242created by Matsumoto, rather:
4343
4444▼ `st.c` credits
45+
4546<pre class="longlist">
4647 1 /* This is a public domain general purpose hash table package
4748 written by Peter Moore @ UCB. */
@@ -52,32 +53,32 @@ created by Matsumoto, rather:
5253as shown above.
5354
5455By the way, when I searched Google and found another version, it mentioned
55- that st_table is a contraction of "STring TABLE". However, I find it
56+ that ` st_table` is a contraction of "STring TABLE". However, I find it
5657contradictory that it has both "general purpose" and "string" aspects.
5758
5859h4. What is a hash table?
5960
6061A hash table can be thought as the following: Let us think of an array with
61- n items. For example, let us make n =64 (figure 1).
62+ `n` items. For example, let us make `n` =64 (figure 1).
6263
6364!images/ch_name_array.png(Array)!
6465
65- Then let us specify a function f that takes a key and produces an integer i
66- from 0 to n -1 (0-63). We call this f a hash function. f when given the same
67- key always produces the i . For example, if we make the assumption that the
66+ Then let us specify a function f that takes a key and produces an integer `i`
67+ from 0 to `n` -1 (0-63). We call this `f` a hash function. `f` when given the same
68+ key always produces the `i` . For example, if we make the assumption that the
6869key is limited to positive integers, then if the key is divided by 64 then
6970the remainder will always fall between 0 and 63. This method of calculation
70- can become function f .
71+ can become function `f` .
7172
72- When recording relationships, given a key, function f generates i , and
73- place the value into index i of the array we have prepared. In other words,
73+ When recording relationships, given a key, function `f` generates `i` , and
74+ place the value into index `i` of the array we have prepared. In other words,
7475the index access into an array is very fast. Therefore the fundamental idea
75- is to change the key into a integer.
76+ is to change the key into an integer.
7677
7778!images/ch_name_aset.png(Array assignment)!
7879
7980However, in the real world it isn't that easy. There is a critical problem
80- with this idea. Because n is only 64, if there are more than 64
81+ with this idea. Because `n` is only 64, if there are more than 64
8182relationships to be recorded, it is certain that i will be the same for two different keys.
8283It is also possible that with fewer than 64, the same thing can occur.
8384For example, given the previous hash function "key % 64", keys 65 and 129
@@ -111,6 +112,7 @@ chapter, if there is data and code, it is better to read the data first.
111112The following is the data type of `st_table`.
112113
113114▼ `st_table`
115+
114116<pre class="longlist">
115117 9 typedef struct st_table st_table;
116118
@@ -125,6 +127,7 @@ The following is the data type of `st_table`.
125127</pre>
126128
127129▼ `struct st_table_entry`
130+
128131<pre class="longlist">
129132 16 struct st_table_entry {
130133 17 unsigned int hash;
@@ -138,15 +141,17 @@ The following is the data type of `st_table`.
138141
139142`st_table` is the main table structure. `st_table_entry` is a holder that
140143stores one value. `st_table_entry` contains a member called `next` which of
141- course is used to make `st_table_entry` into a linked list. This is the chain part of the chaining method.
142- The `st_hash_type` data type is used, but I will explain this later. First let
143- me explain the other parts so you can compare and understand the roles.
144+ course is used to make `st_table_entry` into a linked list. This is the chain
145+ part of the chaining method. The `st_hash_type` data type is used, but I will
146+ explain this later. First let me explain the other parts so you can compare
147+ and understand the roles.
144148
145149!images/ch_name_sttable.png(`st_table` data structure)!
146150
147151So, let us comment on `st_hash_type`.
148152
149153▼ `struct st_hash_type`
154+
150155<pre class="longlist">
151156 11 struct st_hash_type {
152157 12 int (*compare)(); /* comparison function */
@@ -187,11 +192,11 @@ And it is called like this:
187192</pre>
188193
189194Here let us return to the `st_hash_type` commentary. Of the two members
190- `hash` and `compare`, `hash` is the hash function f explained previously.
195+ `hash` and `compare`, `hash` is the hash function `f` explained previously.
191196
192197On the other hand, `compare` is a function that evaluates if the key is actually the
193198same or not. With the chaining method, in the spot with the same hash value
194- n , multiple elements can be inserted. To know exactly which element is
199+ `n` , multiple elements can be inserted. To know exactly which element is
195200being searched for, this time it is necessary to use a comparison function
196201that we can absolutely trust. `compare` will be that function.
197202
@@ -202,7 +207,7 @@ same kind of hash for each (data type) is foolish. Usually, the things
202207that change with the different key data types are things like the hash
203208function. For things like memory allocation and collision detection,
204209typically most of the code is the same. Only the parts where the
205- implementation changes with a differing data type will bundled up into a
210+ implementation changes with a differing data type will be bundled up into a
206211function, and a pointer to that function will be used. In this fashion, the
207212majority of the code that makes up the hash table implementation can
208213use it.
@@ -223,6 +228,7 @@ introduced in the previous chapter. This function creates a table for
223228integer data type keys.
224229
225230▼ `st_init_numtable()`
231+
226232<pre class="longlist">
227233 182 st_table*
228234 183 st_init_numtable()
@@ -238,6 +244,7 @@ on. `type_numhash` is an `st_hash_type` (it is the member named "type" of `st_ta
238244Regarding this `type_numhash`:
239245
240246▼ `type_numhash`
247+
241248<pre class="longlist">
242249 37 static struct st_hash_type type_numhash = {
243250 38 numcmp,
@@ -271,6 +278,7 @@ it's a good idea to look at the function that does the searching. Shown below is
271278function that searches the hash table, `st_lookup()`.
272279
273280▼ `st_lookup()`
281+
274282<pre class="longlist">
275283 247 int
276284 248 st_lookup(table, key, value)
@@ -300,6 +308,7 @@ The important parts are pretty much in `do_hash()` and `FIND_ENTRY()`. Let us
300308look at them in order.
301309
302310▼ `do_hash()`
311+
303312<pre class="longlist">
304313 68 #define do_hash(key,table) (unsigned int)(*(table)->type->hash)((key))
305314
@@ -320,6 +329,7 @@ function `type->hash` for each data type.
320329Next, let us examine `FIND_ENTRY()`.
321330
322331▼ `FIND_ENTRY()`
332+
323333<pre class="longlist">
324334 235 #define FIND_ENTRY(table, ptr, hash_val, bin_pos) do {\
325335 236 bin_pos = hash_val%(table)->num_bins;\
@@ -376,6 +386,7 @@ already registered. It always adds a new entry. This is the meaning of `direct`
376386in the function name.
377387
378388▼ `st_add_direct()`
389+
379390<pre class="longlist">
380391 308 void
381392 309 st_add_direct(table, key, value)
@@ -401,6 +412,7 @@ Then the insertion operation seems to be implemented by `ADD_DIRECT()`.
401412Since the name is all uppercase, we can anticipate that is a macro.
402413
403414▼ `ADD_DIRECT()`
415+
404416<pre class="longlist">
405417 268 #define ADD_DIRECT(table, key, value, hash_val, bin_pos) \
406418 269 do { \
@@ -445,6 +457,7 @@ is NULL, this code holds true.
445457Now, let me explain the code I left aside.
446458
447459▼ `ADD_DIRECT()`-`rehash`
460+
448461<pre class="longlist">
449462 271 if (table->num_entries / (table->num_bins) \
450463 > ST_DEFAULT_MAX_DENSITY) { \
@@ -464,6 +477,7 @@ per bin become too many, `bin` is increased and the crowding is reduced.
464477The current `ST_DEFAULT_MAX_DENSITY` is
465478
466479▼ `ST_DEFAULT_MAX_DENSITY`
480+
467481<pre class="longlist">
468482 23 #define ST_DEFAULT_MAX_DENSITY 5
469483
@@ -479,6 +493,7 @@ h3. `st_insert()`
479493`st_lookup()`, so if you understand those two, this will be easy.
480494
481495▼ `st_insert()`
496+
482497<pre class="longlist">
483498 286 int
484499 287 st_insert(table, key, value)
@@ -521,6 +536,7 @@ The conversion from string to `ID` is executed by `rb_intern()`. This function
521536is rather long, so let's omit the middle.
522537
523538▼ `rb_intern()` (simplified)
539+
524540<pre class="longlist">
5255415451 static st_table *sym_tbl; /* char* to ID */
5265425452 static st_table *sym_rev_tbl; /* ID to char* */
@@ -571,6 +587,7 @@ This function also sets the `ID` classification flags so it is long. Let me
571587simplify it.
572588
573589▼ `rb_id2name()` (simplified)
590+
574591<pre class="longlist">
575592char *
576593rb_id2name(id)
@@ -605,6 +622,7 @@ And it can be obtained like so: `"string".intern`. The implementation of
605622`String#intern` is `rb_str_intern()`.
606623
607624▼ `rb_str_intern()`
625+
608626<pre class="longlist">
6096272996 static VALUE
6106282997 rb_str_intern(str)
@@ -636,6 +654,7 @@ And the reverse operation is accomplished using `Symbol#to_s` and such.
636654The implementation is in `sym_to_s`.
637655
638656▼ `sym_to_s()`
657+
639658<pre class="longlist">
640659 522 static VALUE
641660 523 sym_to_s(sym)
0 commit comments