-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add method to remove runtime patterns after run #4236
Conversation
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
If a dataset is defined in catalog or as a pattern, the catalog would not return as, with the exception of MemoryDataset. In this case it should be still return because the runtime pattern add a memory dataset. Can you clarify why changing the pattern resolution somehow breaks this? |
As I tried to explain above after we add |
Line 90 in ba98135
|
I get this in the second run it will be in the So my expectation here is that, even if it's registered, it will still be return. Line 110 in ba98135
Trace:
This is the result I get with other test, I think the problem is where we define Memory dataset, in 2nd run I expected, y_test, X_test, regressor in the memory_datasets Identify MemoryDataset in the catalog
|
They are not in the catalog because we make
before calling In the new catalog, it will be as you expect. |
Please note that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that we are removing shallow_copy
stuff in the future, this solution makes sense to me! Thanks @ElenaKhaustova 👍🏾
Moved to draft now as want to double-check some cases that @noklam shared. |
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
For reference I shared my edge cases here: https://github.com/noklam/kedro-runner-bug-investigation/commits/main/ |
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
The case shared above was also not handled by the previous kendo versions. It relates to the issue raised by the user, so an alternative solution was added to fix both. The issue is that we were processing runtime patterns separately from others and never returning them in the In the solution we move the logic of accessing datasets in the catalog after runtime pattern is added, and the logic of accessing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the change! I think this looks good now. I think we need a regression test case here, the original one shared by the user is a good one, where we run the same catalog twice should get the same result. I can approve this quickly with the test.
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Added test to run node twice and double-checked that it failed without changes made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ElenaKhaustova 👍🏾
Description
After the pattern resolution logic refactoring we process all the patterns (
dataset
,default
,runtime
) together.As a result of
run()
we return:kedro/kedro/runner/runner.py
Line 108 in ba98135
Before the
run()
we add a runtime pattern to the catalog, so we could process all intermediate outputs asMemoryDataset
.So currently when we do two consecutive runs #4235 runtime pattern
{default}
added after the first run affects the next runs, so that all datasets match it and we do not return anything as a result ofrun()
cause we think these datasets are in the catalog.Development notes
We think that the resolution logic is correct and all patterns should be processed together as we do now. To avoid this behaviour we added method to remove runtime patters after the run, so they only live within the run and do not affect other runs.
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file