-
-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Multiple Schedulers with semi reliable locking and failover #195
Comments
I agree with your proposal. PR welcome :) |
Thank you for the Green Light! Will start working on this soon and keep
this thread updated. :)
…On Tue, Oct 30, 2018, 8:03 PM Selwin Ong ***@***.***> wrote:
I agree with your proposal. PR welcome :)
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#195 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AD__8Udo8SP10sLT3YDsx09tnb6a6hruks5uqGMogaJpZM4YBtWP>
.
|
@oxalorg do you have a feature branch I could review? |
Hey @russellballestrini I've made multiple schedulers work here: https://github.com/oxalorg/rq-scheduler/tree/feature-multi-schedulers Although I'm still debating about the way it's implemented. ATM lock isn't held by a single scheduler continously. It's only held for the time when it wants to move jobs to worker queues. So when running multiple schedulers, any one can acquire a lock and process the queues. Any thoughts if this is how it should be, or should one scheduler hold the lock until it dies/crashes/quits and THEN another scheduler must be allowed to try and get the lock? Also, we're not really storing in redis on WHICH SCHEDULER has the lock, which schedulers are registered etc. Those might be of advantage as well. |
I think what you have is great. Letting all the schedulers race for the lock simplifies the problem and also insures that the scheduled job doesn't get missed so long as you have at least one functional scheduler. |
I don't think we need a concept of registering schedulers. They don't need to communicate with each other, they don't need consensus. |
I think schedulers still need to be registered for troubleshooting/monitoring purposes. |
@oxalorg Is there any update on this at all? |
Hey @mattjegan and @selwin I have created a PR for this feature. 😸 Let me know what you guys think! |
@oxalorg Thanks, I'll give it a shot when I get the chance. |
#212 fixes this Issue and is now merged to master! 🎊 |
There have been a lot of issues and PRs referencing this, but no one has got it quite right yet. I would like to discuss the things we need to get Multiple Schedulers running for failover.
The
rq
set of libraries are amazingly simple and I would love to continue using them. I feel like this might be a deal breaker for a lot of folks in adapting RQ + RQ-Scheduler.Use Case
The feature I'm most interested in is: Multiple Schedulers running at the same time, but only one scheduler will be active. If the active scheduler dies for whatever reason, an inactive scheduler will become active.
This is a very important feature for us as we're hoping to run the scheduler on multiple servers for a failover. (Also makes our deployment easier as each server stays identical).
Previous Attempts
#143 Seems to be a PR for Multi Schedulers, but it introduces a bug where more than 1 Scheduler won't even start/register itself.
#170 Tries to fix this issue by completely removing the Birth/Death registration which may not be ideal as we no longer have track of all registered schedulers, and who is active at any given moment.
In both the above cases, (on a first glance, but pardon me If I'm wrong) the locking mechanism doesn't seem reliable and may cause multiple schedulers to acquire the lock.
Fix
I would like to propose a fix for these issues, and introduce it as a somewhat reliable feature.
A rough plan I have in mind:
Please let me know if a PR like this would be appreciated (via an Emoji Thumbsup) / please let me know your thoughts on this @selwin
The text was updated successfully, but these errors were encountered: