An external liveblog CMS with a very high update frequency is hitting the race condition we feared when using the Content and Story APIs to keep records of these stories in the ARC ecosystem. We realized from the outset that it would be possible for multiple threads to use the Content API search endpoint to check for the existence of a story by canonical_url, for both to receive a response indicating the story does not exist, and then for both to create a new story, resulting in duplicates. This happened in production just a few days after we started relying on it.
The best mitigation we can come up with is to check for duplication after the fact, and even that might not catch the duplication the first time. Without an atomic operation based on the unique canonical_url, all we can do is push the same kind of race condition around. For example, if we try to store the ARC story ID returned from the POST endpoint and use that as our definition of existence, then we have a race condition where we haven't stored the ID by the time another thread checks for it.
Is it possible to implement a duplicate-resistant endpoint for creating stories keyed off canonical_url or some other unique value known to the client? Is there some other duplicate story mitigation you can recommend?