In this example:
#[target_clones("[x86|x86_64]+avx")]
fn foo() { /* snip */ }
#[target_clones("[x86|x86_64]+avx")]
fn bar() { foo(); }
foo should be statically dispatched when invoked in bar, since the CPU features have already been established when dispatching bar. It would also be nice if this even worked when functions have mismatched feature sets (x86+sse+avx should be able to statically dispatch x86+sse functions).