Django does not have OUTER JOIN functionality,
and this library intend to introduce outer joins into the Django ORM.
And OUTER JOIN is emulated by creating an unmanaged model:
from django.db import models
from outer_join import OuterJoin
class A0(models.Model):
key = models.IntegerField(unique=True)
field1 = models.IntegerField()
field2 = models.IntegerField()
class A1(models.Model):
key = models.IntegerField(unique=True)
field1 = models.IntegerField(null=True)
field3 = models.IntegerField()
class A(models.Model):
key = models.IntegerField(unique=True)
field1 = models.IntegerField(null=True)
field2 = models.IntegerField(null=True)
field3 = models.IntegerField(null=True)
class Meta:
managed = False
outer_join = OuterJoin(
A1, A0,
on='key',
)
objects = outer_join.get_manager()()By overwriting the manager, we intercept the SELECT statement
and turn it into an OUTER JOIN with underlying tables.
When there is a field that exists in more than one model (e.g. field1 above),
a COALESCE is used in the order the models are passed into OuterJoin.
You can think of it as A1.field1 overwrites A0.field1 if A1.field1 is not null.
- Python >= 3.6 (because we love type hints)
- Django >= 2
- Note for Django >= 2.2, the related manager behavior has changed, and is not yet supported by this package. Please refer to the 2.1 doc.
Also, if you are using an intermediate model for a many-to-many relationship, then the
add(),create(),remove(), andset()methods are disabled.
- Note for Django >= 2.2, the related manager behavior has changed, and is not yet supported by this package. Please refer to the 2.1 doc.
- PostgreSQL
- We only test/support Postgres
- Does not work with sqlite because it doesn't have
OUTER JOIN - Django use different
SQLCompilerclasses for MySQL and Oracle, so they most likely will have issues - Should work fine with other SQLs
See models and tests under the tests folder, which is a Django project.
Each app is a use case, which contains a full set of models
and potential usages in test files.
pip install git+git://github.com/SelfHacked/django-outer-join.git#egg=django-outer-joinThe outer_join.OuterJoin class provides read-only access.
It should be declared within a model (on the same level as fields and managers).
It is only intended to provide managers and will not be accessible at runtime.
__init__*models: Type[Model]- At least must be provided.
- Order is important because the way
COALESCEworks.
queryset: Type[QuerySet]- A base queryset class for all managers.
on: Union[str, Sequence[str]]- Field(s) to join on.
get_manager -> Type[Manager]*(force kwargs, see PEP 3102)filter_initial_queryset: Union[QuerySetFilter, Sequence[QuerySetFilter], None]QuerySetFilteris a function (Queryset) ->QuerySet. Seeouter_join.extra.querysetfor convenience implementations.- When multiple functions are provided, they will be applied in order.
For the model itself, all fields that can be null from an outer join result should have null=True.
The model must have managed = False (obviously),
and all managers should be generated from get_manager method.
The outer_join.WritableOuterJoin class provides read-write access.
It is inherited from the OuterJoin class.
The first model will be used as read-write, and others will be read-only.
To make deletions logically work, we provide a convenience class
outer_join.extra.models.AbstractDeleteRecord
which is an abstract model with a boolean field is_deleted
and overwrites insertion and deletion methods.
When using this model,
- In the inherited model,
If the primary key is not the
onfield in the writable model,_save_check_fieldsneeds to be specified. It is implemented this way so that the create methods for the outer join model is not dependent on usingAbstractDeleteRecord, and vice-versa. - In the outer join model,
The default manager (
objects) should also have an initial filter.
See test01 for example.
Because of how COALESCE works,
setting a field to null will only set it to null in the read-write table,
and it will revert to whatever value read-only table(s) provide.
The read-write model should have null=True or default=
for all fields that are not on.
Be careful with objects returned from create or calling the class constructor,
since they will not contain any information from the database,
even after they have been saved.
It is recommended to always re-select the object after a database save call.
Whenever there is a relation that involves a outer-join model, including
- ForeignKey from
- ForeignKey to
- M2M
extra steps must be taken for the outer-join model itself.
- The outer-join model must have a manual primary key, with the exception of M2M through models (see below). It could simply be the
onfield. The base models don't have this requirement. base_manager_namein the meta options must be specified, and it should be generated manager without an initial queryset filter.
It is thus recommended to always have these set up for all outer-join models, in case the models will be used in relations in the future.
At the mean time, these steps must be take on the other side:
- All related "normal" models must use the
outer_join.OuterJoinInterceptorclass so that sql generation can work correctly. It is similar to use asOuterJoinbut only takesquerysetargument. - Relations into it must have
db_constraint=False, because it's not referencing a real table.
See test04,
test05 and
test06 for example.
Relations in the base tables can still be set up the way they are intended.
Specifically for M2M relations,
if the relation itself doesn't require outer-join,
then a through model is not needed,
but db_constraint must be set to False.
See test07 for example.
If the relation itself requires outer-join,
a through model should be manually provided.
See test08 for example.
For relation fields in base models,
you can use related_name='+' to disable related managers.
Sometimes a single primary key for a model that has multiple on fields is required,
e.g. when building an url.
We provide a way to fake a primary key with all the on fields.
outer_join = OuterJoin(
...,
on=['a', 'b'],
)
primary_key = outer_join.get_primary_key()()get_primary_key -> Type[Field]base_class: Type[Field]- Default is
TextField. - You can replace this with a field type, e.g.
SlugField. However, validation is probably not guaranteed (not tested). primary_keyis set toTrue. You can only have one primary key field in a model, obviously.
- Default is
*format_pk: Callable[[Tuple], str]- Used to format a pk for objects selected from database.
- Default is
outer_join.extra.fake_pk.hyphen_join. - Format the
onfields, in the order they are specified, into a primary key. - Do not use default if there are
'-'s in youronfields.
parse_pk: Callable[[str], Tuple]- Used to parse pk in a
pk_exact=lookup. - Default is
outer_join.extra.fake_pk.hyphen_split. - This function should reverse
format_pk.
- Used to parse pk in a
See the Through model in test08 for example.
By default, the primary key will be a - joined string of all on field values.
The primary key should only be used in SELECT statements,
and only exact lookup (pk= or pk__exact=) is supported.
It will be populated however for objects selected through other means.
Django will add a CreateModel for unmanaged models in migrations,
but will not update them.
If your unmanaged model does not have a manual primary key,
you can ignore all the fields in the migration
or even remove them (just use fields=[]).
However, Django will still use the model specified in the migration to recreate the db state and decide what to do in dependent migrations. Thus, if there is a manual primary key referenced in other managed models, it must also be maintained manually in the migration history, so that any foreign key will have the correct type. See this StackOverflow post.
If you are using the fake primary key above,
you can use the base type in the migration, with primary_key=True.
The default base type is models.TextField.
See test03/migrations/0001.
The FROM table is extended into FULL OUTER JOIN with all base tables.
Outside of the FROM clause,
all columns in the outer-join model is transformed:
- If it exists in only one base model, we use the column from that model;
- If it exists in multiple models, we convert it into a
COALESCEcall- If
COALESCEis in theSELECTclause, we also apply an alias (AS) so that the column name is preserved.
- If
Django introduces JOIN statements in various occasions,
notably relation filters (related_name__field),
select_related, and M2M queries.
Consider this case:
FROM A INNER JOIN BWhere B is an outer-join model.
The safest, and probably only logically correct solution is to
introduce B as a subquery.
This way, we do not need to change anything in the outer query at all.
FROM A INNER JOIN (
B1 FULL OUTER JOIN B0
) AS BThere will apparently be performance complications, but it's the best solution without changing Django SQL generation significantly.
For an INSERT statement,
it logically requires that the record doesn't exist in the OUTER JOIN result,
so it must not exist in the read-write table.
We simple pass on the statement to the read-write model.
When using AbstractDeleteRecord,
it will then see if an object is in the table with is_deleted=True.
If yes, delete the object first before attempt to INSERT.
This logic is implemented in AbstractDeleteRecord.
We must first know whether an object is in the read-write table or not,
before deciding to run INSERT OR UPDATE on the object.
Whether an object exists or not is determined by querying the on column(s).
For objects that don't exist in the read-write table, we insert them; otherwise we update them.
If the call is batch, we use batch insert/update.
For objects that don't exist in the read-write table; we create them first; and then we delete all the objects.
If the call is batch, we update batch insert/delete.
This might sound redundant,
but it guarantees any custom delete logic in the read-write model is applied to all objects,
which is especially important for AbstractDeleteRecord.