Skip to content

Commit fd226a8

Browse files
committed
Merge branch 'en/xdiff-cleanup-2' into seen
Code clean-up. Comments? * en/xdiff-cleanup-2: xdiff: rename rindex -> reference_index xdiff: change rindex from long to size_t in xdfile_t xdiff: make xdfile_t.nreff a size_t instead of long xdiff: make xdfile_t.nrec a size_t instead of long xdiff: split xrecord_t.ha into line_hash and minimal_perfect_hash xdiff: use unambiguous types in xdl_hash_record() xdiff: use size_t for xrecord_t.size xdiff: make xrecord_t.ptr a uint8_t instead of char xdiff: use ssize_t for dstart/dend, make them last in xdfile_t doc: define unambiguous type mappings across C and Rust
2 parents 9a7672c + 040fc0b commit fd226a8

File tree

11 files changed

+338
-109
lines changed

11 files changed

+338
-109
lines changed
Lines changed: 229 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,229 @@
1+
= Unambiguous types
2+
3+
Most of these mappings are obvious, but there are some nuances and gotchas with
4+
Rust FFI (Foreign Function Interface).
5+
6+
This document defines clear, one-to-one mappings between primitive types in C,
7+
Rust (and possible other languages in the future). Its purpose is to eliminate
8+
ambiguity in type widths, signedness, and binary representation across
9+
platforms and languages.
10+
11+
For Git, the only header required to use these unambiguous types in C is
12+
`git-compat-util.h`.
13+
14+
== Boolean types
15+
[cols="1,1", options="header"]
16+
|===
17+
| C Type | Rust Type
18+
| bool^1^ | bool
19+
|===
20+
21+
== Integer types
22+
23+
In C, `<stdint.h>` (or an equivalent) must be included.
24+
25+
[cols="1,1", options="header"]
26+
|===
27+
| C Type | Rust Type
28+
| uint8_t | u8
29+
| uint16_t | u16
30+
| uint32_t | u32
31+
| uint64_t | u64
32+
33+
| int8_t | i8
34+
| int16_t | i16
35+
| int32_t | i32
36+
| int64_t | i64
37+
|===
38+
39+
== Floating-point types
40+
41+
Rust requires IEEE-754 semantics.
42+
In C, that is typically true, but not guaranteed by the standard.
43+
44+
[cols="1,1", options="header"]
45+
|===
46+
| C Type | Rust Type
47+
| float^2^ | f32
48+
| double^2^ | f64
49+
|===
50+
51+
== Size types
52+
53+
These types represent pointer-sized integers and are typically defined in
54+
`<stddef.h>` or an equivalent header.
55+
56+
Size types should be used any time pointer arithmetic is performed e.g.
57+
indexing an array, describing the number of elements in memory, etc...
58+
59+
[cols="1,1", options="header"]
60+
|===
61+
| C Type | Rust Type
62+
| size_t^3^ | usize
63+
| ptrdiff_t^4^ | isize
64+
|===
65+
66+
== Character types
67+
68+
This is where C and Rust don't have a clean one-to-one mapping. A C `char` is
69+
an 8-bit type that is signless (neither signed nor unsigned) which causes
70+
problems with e.g. `make DEVELOPER=1`. Rust's `char` type is an unsigned 32-bit
71+
integer that is used to describe Unicode code points. Even though a C `char`
72+
is the same width as `u8`, `char` should be converted to u8 where it is
73+
describing bytes in memory. If a C `char` is not describing bytes, then it
74+
should be converted to a more accurate unambiguous type.
75+
76+
While you could specify `char` in the C code and `u8` in Rust code, it's not as
77+
clear what the appropriate type is, but it would work across the FFI boundary.
78+
However the bigger problem comes from code generation tools like cbindgen and
79+
bindgen. When cbindgen see u8 in Rust it will generate uint8_t on the C side
80+
which will cause differ in signedness warnings/errors. Similaraly if bindgen
81+
see `char` on the C side it will generate `std::ffi::c_char` which has its own
82+
problems.
83+
84+
=== Notes
85+
^1^ This is only true if stdbool.h (or equivalent) is used. +
86+
^2^ C does not enforce IEEE-754 compatibility, but Rust expects it. If the
87+
platform/arch for C does not follow IEEE-754 then this equivalence does not
88+
hold. Also, it's assumed that `float` is 32 bits and `double` is 64, but
89+
there may be a strange platform/arch where even this isn't true. +
90+
^3^ C also defines uintptr_t, but this should not be used in Git. +
91+
^4^ C also defines ssize_t and intptr_t, but these should not be used in Git. +
92+
93+
== Problems with std::ffi::c_* types in Rust
94+
TL;DR: They're not guaranteed to match C types for all possible C
95+
compilers/platforms/architectures.
96+
97+
Only a few of Rust's C FFI types are considered safe and semantically clear to
98+
use: +
99+
100+
* `c_void`
101+
* `CStr`
102+
* `CString`
103+
104+
Even then, they should be used sparingly, and only where the semantics match
105+
exactly.
106+
107+
The std::os::raw::c_* (which is deprecated) directly inherits the problems of
108+
core::ffi, which changes over time and seems to make a best guess at the
109+
correct definition for a given platform/target. This probably isn't a problem
110+
for all platforms that Rust supports currently, but can anyone say that Rust
111+
got it right for all C compilers of all platforms/targets?
112+
113+
On top of all of that we're targeting an older version of Rust which doesn't
114+
have the latest mappings.
115+
116+
To give an example: c_long is defined in
117+
footnote:[https://doc.rust-lang.org/1.63.0/src/core/ffi/mod.rs.html#175-189[c_long in 1.63.0]]
118+
footnote:[https://doc.rust-lang.org/1.89.0/src/core/ffi/primitives.rs.html#135-151[c_long in 1.89.0]]
119+
120+
=== Rust version 1.63.0
121+
122+
[source]
123+
----
124+
mod c_long_definition {
125+
cfg_if! {
126+
if #[cfg(all(target_pointer_width = "64", not(windows)))] {
127+
pub type c_long = i64;
128+
pub type NonZero_c_long = crate::num::NonZeroI64;
129+
pub type c_ulong = u64;
130+
pub type NonZero_c_ulong = crate::num::NonZeroU64;
131+
} else {
132+
// The minimal size of `long` in the C standard is 32 bits
133+
pub type c_long = i32;
134+
pub type NonZero_c_long = crate::num::NonZeroI32;
135+
pub type c_ulong = u32;
136+
pub type NonZero_c_ulong = crate::num::NonZeroU32;
137+
}
138+
}
139+
}
140+
----
141+
142+
=== Rust version 1.89.0
143+
144+
[source]
145+
----
146+
mod c_long_definition {
147+
crate::cfg_select! {
148+
any(
149+
all(target_pointer_width = "64", not(windows)),
150+
// wasm32 Linux ABI uses 64-bit long
151+
all(target_arch = "wasm32", target_os = "linux")
152+
) => {
153+
pub(super) type c_long = i64;
154+
pub(super) type c_ulong = u64;
155+
}
156+
_ => {
157+
// The minimal size of `long` in the C standard is 32 bits
158+
pub(super) type c_long = i32;
159+
pub(super) type c_ulong = u32;
160+
}
161+
}
162+
}
163+
----
164+
165+
Even for the cases where C types are correctly mapped to Rust types via
166+
std::ffi::c_* there are still problems. Let's take c_char for example. On some
167+
platforms it's u8 on others it's i8.
168+
169+
=== Subtraction underflow in debug mode
170+
171+
The following code will panic in debug on platforms that define c_char as u8,
172+
but won't if it's an i8.
173+
174+
[source]
175+
----
176+
let mut x: std::ffi::c_char = 0;
177+
x -= 1;
178+
----
179+
180+
=== Inconsistent shift behavior
181+
182+
`x` will be 0xC0 for platforms that use i8, but will be 0x40 where it's u8.
183+
184+
[source]
185+
----
186+
let mut x: std::ffi::c_char = 0x80;
187+
x >>= 1;
188+
----
189+
190+
=== Equality fails to compile on some platforms
191+
192+
The following will not compile on platforms that define c_char as i8, but will
193+
if it's u8. You can cast x e.g. `assert_eq!(x as u8, b'a');`, but then you get
194+
a warning on platforms that use u8 and a clean compilation where i8 is used.
195+
196+
[source]
197+
----
198+
let mut x: std::ffi::c_char = 0x61;
199+
assert_eq!(x, b'a');
200+
----
201+
202+
== Enum types
203+
Rust enum types should not be used as FFI types. Rust enum types are more like
204+
C union types than C enum's. For something like:
205+
206+
[source]
207+
----
208+
#[repr(C, u8)]
209+
enum Fruit {
210+
Apple,
211+
Banana,
212+
Cherry,
213+
}
214+
----
215+
216+
It's easy enough to make sure the Rust enum matches what C would expect, but a
217+
more complex type like.
218+
219+
[source]
220+
----
221+
enum HashResult {
222+
SHA1([u8; 20]),
223+
SHA256([u8; 32]),
224+
}
225+
----
226+
227+
The Rust compiler has to add a discriminant to the enum to distinguish between
228+
the variants. The width, location, and values for that discriminant is up to
229+
the Rust compiler and is not ABI stable.

xdiff-interface.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,7 @@ void xdiff_clear_find_func(xdemitconf_t *xecfg)
300300

301301
unsigned long xdiff_hash_string(const char *s, size_t len, long flags)
302302
{
303-
return xdl_hash_record(&s, s + len, flags);
303+
return xdl_hash_record((uint8_t const**)&s, (uint8_t const*)s + len, flags);
304304
}
305305

306306
int xdiff_compare_lines(const char *l1, long s1,

xdiff/xdiffi.c

Lines changed: 14 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -22,9 +22,9 @@
2222

2323
#include "xinclude.h"
2424

25-
static unsigned long get_hash(xdfile_t *xdf, long index)
25+
static size_t get_hash(xdfile_t *xdf, long index)
2626
{
27-
return xdf->recs[xdf->rindex[index]].ha;
27+
return xdf->recs[xdf->reference_index[index]].minimal_perfect_hash;
2828
}
2929

3030
#define XDL_MAX_COST_MIN 256
@@ -278,10 +278,10 @@ int xdl_recs_cmp(xdfile_t *xdf1, long off1, long lim1,
278278
*/
279279
if (off1 == lim1) {
280280
for (; off2 < lim2; off2++)
281-
xdf2->changed[xdf2->rindex[off2]] = true;
281+
xdf2->changed[xdf2->reference_index[off2]] = true;
282282
} else if (off2 == lim2) {
283283
for (; off1 < lim1; off1++)
284-
xdf1->changed[xdf1->rindex[off1]] = true;
284+
xdf1->changed[xdf1->reference_index[off1]] = true;
285285
} else {
286286
xdpsplit_t spl;
287287
spl.i1 = spl.i2 = 0;
@@ -385,7 +385,7 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long i1, long i2, long chg1,
385385

386386
static int recs_match(xrecord_t *rec1, xrecord_t *rec2)
387387
{
388-
return (rec1->ha == rec2->ha);
388+
return rec1->minimal_perfect_hash == rec2->minimal_perfect_hash;
389389
}
390390

391391
/*
@@ -403,11 +403,10 @@ static int recs_match(xrecord_t *rec1, xrecord_t *rec2)
403403
*/
404404
static int get_indent(xrecord_t *rec)
405405
{
406-
long i;
407406
int ret = 0;
408407

409-
for (i = 0; i < rec->size; i++) {
410-
char c = rec->ptr[i];
408+
for (size_t i = 0; i < rec->size; i++) {
409+
uint8_t c = rec->ptr[i];
411410

412411
if (!XDL_ISSPACE(c))
413412
return ret;
@@ -484,7 +483,7 @@ static void measure_split(const xdfile_t *xdf, long split,
484483
{
485484
long i;
486485

487-
if (split >= xdf->nrec) {
486+
if (split >= (long)xdf->nrec) {
488487
m->end_of_file = 1;
489488
m->indent = -1;
490489
} else {
@@ -507,7 +506,7 @@ static void measure_split(const xdfile_t *xdf, long split,
507506

508507
m->post_blank = 0;
509508
m->post_indent = -1;
510-
for (i = split + 1; i < xdf->nrec; i++) {
509+
for (i = split + 1; i < (long)xdf->nrec; i++) {
511510
m->post_indent = get_indent(&xdf->recs[i]);
512511
if (m->post_indent != -1)
513512
break;
@@ -718,7 +717,7 @@ static void group_init(xdfile_t *xdf, struct xdlgroup *g)
718717
*/
719718
static inline int group_next(xdfile_t *xdf, struct xdlgroup *g)
720719
{
721-
if (g->end == xdf->nrec)
720+
if (g->end == (long)xdf->nrec)
722721
return -1;
723722

724723
g->start = g->end + 1;
@@ -751,7 +750,7 @@ static inline int group_previous(xdfile_t *xdf, struct xdlgroup *g)
751750
*/
752751
static int group_slide_down(xdfile_t *xdf, struct xdlgroup *g)
753752
{
754-
if (g->end < xdf->nrec &&
753+
if (g->end < (long)xdf->nrec &&
755754
recs_match(&xdf->recs[g->start], &xdf->recs[g->end])) {
756755
xdf->changed[g->start++] = false;
757756
xdf->changed[g->end++] = true;
@@ -993,11 +992,11 @@ static void xdl_mark_ignorable_lines(xdchange_t *xscr, xdfenv_t *xe, long flags)
993992

994993
rec = &xe->xdf1.recs[xch->i1];
995994
for (i = 0; i < xch->chg1 && ignore; i++)
996-
ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags);
995+
ignore = xdl_blankline((const char *)rec[i].ptr, (long)rec[i].size, flags);
997996

998997
rec = &xe->xdf2.recs[xch->i2];
999998
for (i = 0; i < xch->chg2 && ignore; i++)
1000-
ignore = xdl_blankline(rec[i].ptr, rec[i].size, flags);
999+
ignore = xdl_blankline((const char *)rec[i].ptr, (long)rec[i].size, flags);
10011000

10021001
xch->ignore = ignore;
10031002
}
@@ -1008,7 +1007,7 @@ static int record_matches_regex(xrecord_t *rec, xpparam_t const *xpp) {
10081007
size_t i;
10091008

10101009
for (i = 0; i < xpp->ignore_regex_nr; i++)
1011-
if (!regexec_buf(xpp->ignore_regex[i], rec->ptr, rec->size, 1,
1010+
if (!regexec_buf(xpp->ignore_regex[i], (const char *)rec->ptr, rec->size, 1,
10121011
&regmatch, 0))
10131012
return 1;
10141013

0 commit comments

Comments
 (0)