Concurrency bugs in Lucene: How to fix optimistic concurrency failures

aoli-al · 2025-02-22T17:21:43 1740244903

I’m the author of Fray, a concurrency testing framework for the JVM, and I’m excited to finally share what I’ve been building over the past few years!

Fray[1] is a concurrency testing tool for Java that can help you find and debug tricky race conditions that manifest as assertion violations, run-time exceptions, or deadlocks. I’d love to hear your thoughts—feel free to ask me anything! And if you’re curious, give Fray a try.

[1]: https://github.com/cmu-pasta/fray

vlovich123 · 2025-02-22T22:00:52 1740261652

How does this compare with a generic tool like Antithesis? I recognize closed source money vs open source free but from a feature perspective would Antithesis be more effective at finding the issues since it’s not limited to stuff happening in the JVM / can test concurrency of more complicated network topologies between components?

aoli-al · 2025-02-22T22:36:05 1740263765

AFAIK, Antithesis uses a hypervisor to achieve deterministic execution. This can be less effective because the hypervisor does not have language semantics and faces a larger search space. You may check Figures 5 and 6 in our technical report[1], where we compare Fray against RR, a record and replay tool that can also be used for concurrency testing at OS level[2].

[1]: https://arxiv.org/pdf/2501.12618

[2]: https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo...

_benedict · 2025-02-22T18:12:13 1740247933

We have something very similar[1] we use in the Apache Cassandra project to test complex cluster behaviours.

We appear to use exactly the same basic technique, using byte weaving to intercept concurrency primitives such as synchronized, LockSupport etc to pause the system thread and run them on some schedule.

We only currently run (deterministic) probabilistic traces though, we can’t search the interleaving space. But the traces for a whole cluster are extremely complex and probably unsearchable.

I have been meaning to publish it for broader consumption for years now, but there’s always something more important to do. It’s great to see some dedicated efforts in this space.

[1] https://github.com/apache/cassandra/tree/trunk/test/simulato...

aoli-al · 2025-02-22T20:05:45 1740254745

This looks super cool!

It seems that all controlled threads are wrapped with `InterceptibleThread` in the Cassandra simulator. Does this work for ThreadPools (e.g., ForkJoinPool) as well? We had a hard time intercepting thread objects because they are used by the language runtime (e.g., GC threads) as well and we don’t want to interfere with them. Additionally, modifying application code just track thread creation isn’t ideal. To work around this, we came up with this combination of JVMTi and Java Agent solution and we use JVMTi to monitor thread creation and termination.

As for searching schedules, yes, it is hard to search all possible schedules. However, it turns out many searching algorithms such as probabilistic concurrency testing[1] or partial order sampling[2] are still better than random walk. So it is worth to give them a try.

[1] https://www.microsoft.com/en-us/research/wp-content/uploads/... [2] https://www.cs.columbia.edu/~junfeng/papers/pos-cav18.pdf

_benedict · 2025-02-22T20:48:26 1740257306

We do currently require all threads to be created by one of our own factories, but that's primarily because this grew out of a non-byte weaving approach (where we explicitly replaced our concurrency primitives). Looking at the class now, all of its state could easily be stashed in either global or ThreadLocal variables, so I don't see anything that would stop us working with FJP etc.

> Additionally, modifying application code just track thread creation isn’t ideal.

This would certainly be necessary, but don't you anyway need to rewrite the application to trap synchronised, volatile, atomic accesses etc? It doesn't seem all that different to rewrite calls to Thread::start. The issue of JVM threads is perhaps a little trickier, but I am not averse to some ugly integrations. Just take a look at how we make RNGs deterministic

> So it is worth to give them a try.

Thanks for the tips! I am not sure when I will have time to apply these techniques to our simulator, but they are no doubt valuable for the protocol simulations I am relying on today, so maybe I will have a justification to explore them sometime soon.

Really cool work too. I hope it manages to make its way into more hands, so that this technique can be used more widely.

vlovich123 · 2025-02-22T22:07:13 1740262033

> The motivation behind building Fray stems from a noticeable gap between academia and industry: while deterministic concurrency testing has been extensively studied in academic research for over 20 years, practitioners continue to rely on stress testing—a method widely acknowledged as unreliable and flaky—to test their concurrent programs.

To be fair, the gap is because writing the tests is the hard part. Tests for deterministic testing frameworks can be more complicated because you have to simulate more complex situations with more components interacting (otherwise the simpler targeted tests would have caught your bug). So it works well in terms of making your existing integration tests gain extra value, but the complexity of writing and maintaining those integration tests is the actual challenge.

Don’t get me wrong. I love deterministic simulation testing and along with property tests and mutation testing it’s best in class techniques for having confidence in the efficacy of your tests. Just that the challenges are on the less sexy side of writing the tests whereas academia focuses on the sexy frameworks piece.

aoli-al · 2025-02-22T22:29:56 1740263396

Using Fray does not require knowledge about "deterministic testing" or "controlled concurrency." This is one of its goals: developers write normal concurrency tests, and Fray controls the execution behind the scenes.

In fact, when we evaluate Fray, we collect all existing concurrency tests from Lucene, Kafka, and Guava, and running them under different thread inter-leavings can already reveal so many bugs. [1]

[1]: https://github.com/cmu-pasta/fray/blob/main/docs/bugs.md

vlovich123 · 2025-02-22T22:56:54 1740265014

Writing good “normal” concurrency tests is hard is what I’m saying. I get that it slots in well with existing tests that are already written.

nyanpasu64 · 2025-02-22T22:17:21 1740262641

How do you know a program is free of data races?

aoli-al · 2025-02-22T22:25:58 1740263158

Fray does not know if a program is free of data races. Even if there are data races in a program, Fray can still find bugs, but this violates the soundness guarantee, so Fray may miss data race bugs.

comrade1234 · 2025-02-22T16:33:03 1740241983

Was Lucene the project that started at apple as part of cyberdog or whatever that old email client was called?

Depressing that apple mail had better search 25 years ago than today.

softwaredoug · 2025-02-22T16:35:18 1740242118

I believe it was Doug Cuttings side project to learn Java. It was like his 5th search engine.

comrade1234 · 2025-02-22T16:47:27 1740242847

I see on the wiki his third was at apple. It must have been in cyberdog because the search results were so good and it also had similar search rules/wildcards/etc to Lucene later.

I remember working with lucene around 2000/2001 and how good the results were.