How to handle locking for this?

I’ve an application running in two containers on two different systems.

Each container is running an infinite loop that wakes up every few seconds. When it wakes up, it tries to acquire an optimistic lock from Dynamodb (tries to update an item with a JIT-generated UUID and a timestamp). If it succeeds, it reads items from one database and pushes them as messages onto a queue. At the end, it updates Dynamodb with the unique Id of the last message it got from the database, and then uses that last message Id as the checkpoint for the next run. If it fails to acquire the lock, it goes to sleep and tries again a few seconds later.

My question is… what happens if container A wakes up, acquires the lock, begins processing the events from the DB, but it takes too long and container B wakes up and begins it run before A can update the checkpoint fields? Then I have two containers both feeding duplicate messages downstream because they don’t know the other is still running.

I’ve thought about updating the checkpoint & the lock uuid with every message processed but that’s a lot of extra network calls and makes the problem more likely to come up because it increases the runtime. This is the current “best” answer.

I’ve thought about adding a “lock expiry” time to the lock item’s attribute and making B sleep until that time, but that relies on A to successfully guess how long it will take to process a list of items it doesn’t know the length of yet. Like I could definitely make the expiry be pessimistic, say double the usual runtime, but still doesn’t guarantee the issue won’t come up.

Anyone have any thoughts on a better way to do this?

submitted by /u/Flakmaster92
[link] [comments]

from Software Development – methodologies, techniques, and tools. Covering Agile, RUP, Waterfall + more! https://ift.tt/VmEJBg5

Share this:

Related

Leave a comment Cancel reply