-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
leverage ibis expression for getting readablerelations #2046
base: devel
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for dlt-hub-docs canceled.
|
Another note to self, we probably need to run columns names through the normalizer. Or we assume the user will use normalized names as they are present in the schema when building these expressions. |
df = items_table.df() | ||
assert len(df.index) == total_records | ||
|
||
df = double_items_table.df() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regular dlt dataset execution methods (df, arrow, iter_arrow...) work everywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is really cool! IMO we should keep ibis as optional dependency (it only works with python 3.10+). so we have two options:
- separate relation
- enable the proxy behavior if ibis is found, if not we fallback to the current behavior
i'd probably go for the second option. I'm just a little bit worried about the typing
in both cases we should implement a few common expressions we already have in our existing relation (limit, head, column selection).
regarding the schema: column lineage you can do with sqlglot. it makes sense to invest a little bit of time to understand how it is done:
- https://github.com/tobymao/sqlglot/blob/main/posts/ast_primer.md (btw. it seems that we are not finding tables in expression in a correct way)
- https://sqlglot.com/sqlglot/lineage.html
we can add sqlglot as a regular dependency. and use it everywhere we have sql SELECT statements.
b636df4
to
b9cf262
Compare
Description
This is a quick and dirty implementation of the idea to use ibis expressions for selecting from our native relations. Basically I use a ReadableRelation as a Wrapper/Proxy around an ibis expression and convert it to sql at the moment of execution, which gives us a very powerful interface for free.
Please read the last test in test_read_interfaces. This test passes against duckdb (which is the hardcoded dialect for this example). It shows how you can limit, join, select, order and aggregate and also chain all of these things. More stuff is also available on the ibis expression side.
Maybe an ibis expression will actuall give us the schema of the result when executed, so if we find a way to convert our schemas into the ibis schema, including all hints with precision etc, we will be able to discover the resultschema and convert that back into a dlt schema. That would be really cool.
Stuff to figure out: