Implement broadcasting for biosequences

### TL;DR
For v3, we should add broadcasting to BioSequence, and make `my_seq[1:4] = DNA_A` error.

### Rationale

I was working on #133 , but can't get it to work properly. When calling `getindex` and `setindex!` with BioSequences, there always seems to be an edge case where dispatch goes the wrong way, or something is not implemented, or you get a method ambiguity error. I remember thinking the same when doing #120 .

I hate method ambiguity errors. They're a critical error in the sense that they completely break your workflow. They can't be found statically, and they're extremely hard to test for. The `LongSequence` test suite is compresensive, but I can guarantee you, I can find method ambiguity in something as simple as `setindex!` there.

Why is this so hard? Why is it so hard for BioSequences in particular?

I've come to think *one* of the reasons is that `my_seq[inds] = x` has two distinct *semtantic* meaning in BioSequences right now, namely these two:

```
julia> my_seq = dna"GGCTA";

julia> my_seq[2:4] = dna"AAC"; my_seq
5nt DNA Sequence:
GAACA

julia> my_seq[2:4] = DNA_M; my_seq
5nt DNA Sequence:
GMMMA
```
Having two distinct semantic meanings covered by the same function is a bad idea. Here's what happens if you do it.
1) Implement both methods. But oh, we need to have them be different signatures to dispatch to the right one. Let's implement `setindex!(::BioSequence, ::Any, ::Vector{Bool})` and `setindex!(::BioSequence, ::BioSequence, ::Vector{Bool})` separately. This works __ONLY__ because one signature is more specific than the other, __NOT__ because the signatures are incompatible. Note that we already loses some flexibility, because we now can't do `my_seq[1:2] = [DNA_A, DNA_C]`, since that would call the wrong method.
2) Make new biosequence e.g. `LongSequence` or `LongSubSeq`. We need to specialize one of the setindex! methods, so we add `setindex!(::LongSequence, ::Any, ::Vector{Bool})`.
3) That new method is now more specific than the fallback (of course), but that now messes up the original dispatch, that was built on one method being more specific than the other. Method ambiguity, so we add  `setindex!(::LongSequence, ::LongSequence, ::Vector{Bool})`. We lose even more flexibility, and the original method error may still be exposed if you try to do `my_seq[1:2] = mer"TA"`. Also, we have to duplicate efforts by having this new method, whose content is identical to the original one.
4) Repeat with a new biological sequence. Same as above, but much worse, since any *combination* of types can now error

We've gone down a wrong path. I can see two solutions:
1) We never implement two semantically different methods with a type intersection between their arguments. That means we discard `setindex!(::BioSequence, ::Any, ::Vector{Bool})`, and instead have a method where we restrict the middle argument to be e.g. `Union{BioSymbol, Char}` or something, and the other method to have `Union{BioSequence, AbstractVector}`. The disadvantage here is that we lose some ducktyping.
2) We never implement two semantically different method within the same function, full stop. That means we get rid of  `setindex!(::BioSequence, ::Any, ::Vector{Bool})` (and a few other functions like it), and replace them with new functions. I *think* we can do this by overloading broadcasting. The disadvantage here is that I don't know how complex implementing broadcasting is.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement broadcasting for biosequences #135

TL;DR

Rationale

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement broadcasting for biosequences #135

Description

TL;DR

Rationale

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions