We Loved It. We Reject It: A Case Study in Anonymous Peer Review Without Accountability

How three ACM ICS 2026 reviewers praised our work as “conceptually novel,” “intellectually stimulating,” and solving a “long-standing problem”—then rejected it because we didn’t compare our Tesla to their Pontiac. A case study in broken peer review, and why we need two-way eBay-like accountability. Please share if this post resonates with you.

Feb 13, 2026 13 min read

“It’s not that you’re wrong, it’s just that I’m not brave enough to say you’re right”
— National Geographic’s Genius (Season 1)

That quote captures anonymous academic review perfectly. What follows is a real story happening now, on Friday the 13th, the ICS 2026 rebuttal deadline.

This is about how three ACM ICS 2026 reviewers praised our work as “conceptually novel,” “intellectually stimulating,” and solving a “long-standing problem”—then rejected it. Not because it’s wrong, but because it challenges 30 years of accepted doctrine and makes previous work look over-engineered.

This post is not about the rejection itself — if it were based on merit, there would be no issue. However, when all reviewers describe a paper as innovative and technically sound, assign strong expertise scores (4/4 each), yet the outcome is rejection based on demonstrable biases and errors, this is no longer about technical merit. It’s a fundamental failure of process integrity, and that’s exactly what this post addresses.

To be precise: when I say process integrity, I mean the absence of accountability that anonymity enables. Aware they operate beyond scrutiny, these reviewers strayed outside the stated scope, introduced irrelevant comparisons, and made factually incorrect claims—as I will demonstrate point by point—all without accountability or recourse.

Background

We are opening technology from mesibo and PatANN. We submitted a detailed paper on our lock-free queue, which is 1.7-9× faster than Moodycamel, Boost, Intel TBB, and Meta Folly, and deployed in production for over 4+ years.

Here’s what happened in academic review…

Act I: The Lavish Praise

Picture this: You submit a paper to ACM ICS 2026. Three anonymous reviewers evaluate your work.

All three rate your expertise as 4 out of 4 (the highest rating in the ACM system). I’m honored!

Here’s what they wrote about our algorithm:

“Conceptually novel.”
“Important and stimulating.”

Reviewer A:

“Relevant topic for ICS, and important in the context of lock-free memory management.”
“Good explanation of motivation, and good coverage of and positioning against related work.”

Reviewer B:

“The temporal-protection approach is conceptually novel.”
“The paper provides a solid overview of trade-offs among existing memory-reclamation techniques.”

Reviewer C:

“Tackles an important and long-standing problem in memory reclamation of lock-free data structures.”
“Offers an inspiring perspective on existing work.”
“Presents a conceptually simple reclamation mechanism.”
“Eliminates coordination requiring per-thread metadata and scanning them costing O(number of threads).”
“Demonstrates better performance against production baselines at high thread counts.”
“I enjoyed reading the paper and the stimulating discussion it raises.”

Sounds great, right? “Novel approach. Solves long-standing problem. Better performance. Intellectually stimulating. Enjoyed reading it.”

Brilliant. We’re getting published. Time to book flights for Supercomputing 2026.

But I didn’t know ICS had M. Night Shyamalan directing their review process…

Act II: The Plot Twist

But we REJECT your paper

Wait, what? Let me read that again.

“Conceptually novel” → Reject
“Solving long-standing problem” → Reject
“Better performance” → Reject
“Intellectually stimulating” → Reject
“Enjoyed reading” → Reject

How does this happen? I asked a senior professor at one of the Bay Area’s top universities. His response:

“Yusuf, this is a problem. Anonymous review means reviewers never have to explain how something can simultaneously be ’novel,’ ‘important,’ ‘stimulating,’ AND not worth publishing. They also don’t have to disclose potential conflicts of interest.”

He warned me against writing this post. More on that later. But first, let’s examine their “thoughtful” reasoning:

Act III: The Real Reasons (AKA The Absurdist Comedy)

Before we dissect their reasoning, here’s the paper: online, pdf. Now let’s look at each reviewers’ contentions one-by-one:

Reviewer A’s Killer Blow: “But What About Trees?”

After praising our queue algorithm, Reviewer A drops this bomb:

“It is not clear how to apply something like this, to e.g. lock-free binary trees, in which relationships between lifetimes of nodes are not partly known upfront.”

I’m totally puzzled. Did this “expert” reviewer even read the paper title?

Mr. Bean, this paper is about QUEUES.

Not trees. Not hash tables. Not distributed systems. Not blockchain. Not your grandmother’s recipe database.

It’s about QUEUES and only QUEUES.

The title literally says: “Coordination-Free Concurrent Lock-Free QUEUES”

The word “QUEUE” is repeated at least 18 times in the title, scope, and introduction alone. But this reviewer is rejecting a queue paper because it doesn’t solve tree reclamation.

This is like rejecting a chocolate cake recipe because the technique doesn’t apply to Biryani. Or, rejecting a paper on sorting algorithms because the reviewer is “unclear how quicksort applies to matrix multiplication.” You can’t be serious.

The reviewer doesn’t stop here, desperately searching for flaws:

“Moreover, the approach with the cycle value implies a global counter, which is a source of contention.”

As you can easily see, the reviewer is fundamentally confused, conflating contention with coordination—the very distinction this paper is built on.

Contention from a shared counter is handled efficiently at the CPU level. Every FIFO queue has this—even the M&S tail pointer. This is cheap.

What we eliminated is coordination: the O(P×K) cost of scanning hazard pointers across all threads, epoch synchronization barriers, inter-thread handshakes. This is expensive, and this is the bottleneck we removed.

You will immediately recognize the problem: The reviewer is again missing the scope and bigger picture. Moreover, our cycle counter adds zero new contention, it leverages contention that already exists in any FIFO queue.

But in anonymous review, you can write “the sky is green” and face zero consequences.

Reviewer B’s Masterpiece: “Compare Your Production System to a Museum Piece”

Reviewer B starts strong:

“The temporal-protection approach is conceptually novel.”

Then delivers the googly (if you know cricket):

“The implementation is not evaluated against state-of-the-art lock-free FIFO queues, such as LCRQ.”

Did the reviewer read Section 4? We explicitly explained why LCRQ wasn’t included.

But issue is bigger: This comment clearly reflects a disconnect between theoretical assumptions and production system reality. LCRQ, the “state-of-the-art” algorithm, is a research artifact with limited evidence of production deployment:

Published: 2013 (12 years ago)
Double jeopardy: Depends on 128-bit CAS and requires coordinated ring transitions under saturation — precisely the bottleneck this paper eliminates
Available in programming languages:
- Java? No
- Go? No
- Rust? No
- Python? No
- JavaScript? No
- C++? Only via non-portable intrinsics
Production deployments in 12 years: Virtually zero, To date, there is no well-known large-scale production deployment comparable to mainstream lock-free queues.
Used primarily in academic literature, making it sacred. In some review cultures, citation count substitutes for deployability.

This comment also shows Reviewer B never read the LCRQ paper or understands, and clearly skimmed ours. And most importantly, all these reasons are explained in Section 4 of our paper, which can only be missed if this was a bedtime review, but who cares when the review is anonymous?

The paper is clear about the scope: production systems. The ones that improve real systems in the machine learning age with thousands of threads that can’t afford coordination. We are not comparing with something that remains a research artifact with limited production evidence.

Our baselines:

Moodycamel: 4,500+ GitHub stars, deployed in Unreal Engine, high-frequency trading, game engines, millions of production deployments
Boost.Lockfree: Enterprise C++ standard, shipped in countless production systems
LCRQ: Can’t even find a canonical implementation used in production

Still, Reviewer B wants us to compare a production-deployed queue against a 12-year-old research prototype that never left the laboratory.

Reviewer B now enters god mode:

“The temporal-protection scheme seems infeasible in practice because it relies on knowing worst-case operation durations, which can be unbounded.”

This makes me wonder: was this Netflix and Read? First missing Section 4, now Section 2.3.3 too? Because we explicitly address the unbounded delay concern there:

Section 2.3.3 “A crashing or stalled thread is a separate bug to fix—not a justification for taxing the fast path 99.9% of the time. Worse, ‘infinite’ protection inevitably leads to ‘infinite’ leak duration.”

Unless someone isn’t familiar with billion-scale production systems, this is easy to understand.

For the record, we’ve been running this “infeasible” scheme in production for 4+ years, with millions of users:

mesibo: 13,000+ deployments across 40 countries
PatANN: Hundreds of concurrent indexing threads

To be honest, labeling a production-deployed algorithm as “infeasible in practice” is an amusing display of academic imagination. One cannot help but wonder how reviewer suitability for systems research is evaluated.

Reviewer C’s Philosophical Journey: From “Trivial” to “Stimulating”

Reviewer C takes us on an emotional rollercoaster:

The Setup (generous praise):

“Tackles an important and long-standing problem” “Offers an inspiring perspective”
“Demonstrates better performance”
“I enjoyed reading the paper and the stimulating discussion it raises.”

The Twist (devastating dismissal):

“The observation that bounded thread delay would permit reclamation after a sufficiently long wait is close to trivial, and it is difficult to believe that the safe-reclamation literature has been unaware of this possibility.”

Wait, hold on.

You enjoyed reading it. It’s stimulating. It solves a long-standing problem. It demonstrates better performance.

But your problem is that the idea is “trivial” and “difficult to believe that no one thought about this before”?

Let’s test this logic and mental block:

The observation is “trivial”
The problem has existed for 30+ years, since Michael & Scott introduced their queue in 1996
Hazard pointers were invented in 2004 to solve this
Epoch-based reclamation in 2006
Dozens of papers from 2006-2024 trying to reduce coordination overhead
Production systems still use coordination-heavy schemes with 10-100× overhead
We implemented bounded temporal protection, deployed it in production, and proved it works
But it’s “trivial”

This is the “obvious in hindsight” fallacy.

Yes, bounded protection seems obvious AFTER we showed it works. But if it was trivial:

Why did nobody do it for 30 years?
Why do Boost, Moodycamel, and Intel TBB still use coordination-heavy schemes?
Why does every production queue use hazard pointers or epochs?
Why is Reviewer C calling it both “trivial” AND “intellectually stimulating” in the same review?

The reality is this:

30 years of previous work was done under one tracked assumption:

“Safe memory reclamation fundamentally requires inter-thread coordination.”

Papers and algorithms were built on this belief: Hazard pointers. Epoch schemes. Quiescent-state tracking. Global barriers. …

But then a paper says something very simple: For FIFO queues, coordination is not fundamental.

This questions a 30-year-old assumption. Now, you are not reviewing a paper anymore. You are reviewing doctrine. Because once you accept the bounded delay model presented in the paper, many coordination schemes look over-engineered. And that makes 30 years of work look conservative.

The work is simple yet provocative. And so is this pushback, which is mental, not technical.

This is “disruption in simplicity” (stealing a quote from GigaOM’s Om Malik, incidentally about my previous work, VoicePHP).

The Real Culprits: Not Reviewers, But the System

There’s a pattern here. Notice what NO reviewer said:

“Your results are wrong”
“Your algorithm doesn’t work”
“Your production deployment failed”
“Your benchmarks are fraudulent”

Instead, the comments are totally irrelevant and out of scope. Most comments come across as someone trying to impress by throwing around terms without understanding them. Some are even immature, just showing off “Googled” knowledge:

“Why doesn’t it work for trees?” (It’s a queue paper)
“Compare with LCRQ” (Intel-only research artifact with limited production evidence)
“The idea is trivial” (Nobody did it for 30 years)
“Need more evaluation” (No amount would satisfy them)

Look at it this way: would you offer a job to Reviewer A who works out of scope, or Reviewer B who thinks LCRQ (zero production use) is “state-of-the-art” and doesn’t know the intricacy, or Reviewer C who admits enjoying and being stimulated by “trivial” work?

From which universe did ACM select these “expert” reviewers? Or did they do such sloppy reviews because of anonymity—they have no skin in the game, no fear of being discredited?

Irrelevant comparisons (trees)
Demanding comparisons with museum pieces (LCRQ)
Demonstrably false claims (“infeasible” despite production deployment)
Internal contradictions (“trivial” + “stimulating”)
Moving goalposts (no amount of evaluation would satisfy them)

Would they write these reviews that lack technical substantiation if their names were attached? This is where we need public accountability.

Public Accountability

Professor warned me not to publish this post for fear of blacklisting. But someone has to speak up, or these malpractices will continue. This isn’t just about our paper—it’s about a fundamentally broken system where only reviewers can provide feedback, not authors. That’s a serious problem.

Why can’t authors rate reviewers? Why isn’t there a public database? We need it, because the current system cannot distinguish between good and bad reviewers.

Good reviewers use anonymity to maintain objectivity, give honest criticism without fear, and reject famous authors’ bad work.

Bad reviewers use anonymity for entirely different reasons: to demand irrelevant comparisons, make false claims (“infeasible”), demonstrate their supposed expertise, and hide conflicts of interest.

What’s the solution?

We don’t necessarily need fully open reviews. We need accountability mechanisms that preserve anonymity, just like eBay’s buyer-seller ratings. Authors should rate reviewers on scope adherence, technical accuracy, relevance, evidence quality, and false statements. As Patterns emerge, chairs can see “Reviewer X frequently makes false claims” versus “Reviewer Y provides thoughtful, accurate feedback.” Good reviewers build credibility through reputation systems, while bad ones face accountability. These mechanisms would transform academic review from an unaccountable black box into a system that rewards good reviewers and filters out bad ones—exactly like every other peer reputation system that works.

Evaluation Criteria for Systems Engineering Papers

There is another deeper issue at play: the widening gap between academic evaluation and production systems engineering.

Evaluating concurrent systems requires understanding that portability, deployability, and operational constraints are not secondary considerations—they are as critical as theoretical elegance. When reviewers dismiss these practical factors while demanding comparisons to academic prototypes that have never seen production deployment, one must question whether their evaluation reflects real systems engineering experience or simply citation patterns from academic literature.

Final Words

This isn’t about our paper. It’s about a broken system where “expert” reviewers face zero consequences for lazy, inaccurate, or contradictory reviews.

The algorithm will be published—maybe not at ICS, but somewhere. The code will be open-sourced. Production systems will keep using it. But the review system? It will keep rejecting good work while protecting bad reviewers.

Until we add accountability, nothing changes.

If any part of this post resonates with you, please share it with your academic circle. The system won’t fix itself.

Our Formal Rebuttal to ICS 2026

In the interest of complete transparency, here is the exact rebuttal we submitted to the ICS 2026 Program Chairs:

Dear Program Chairs,

We appreciate the reviewers' acknowledgment that our work is "conceptually novel," 
"intellectually stimulating," addresses an "important and long-standing problem," 
and their assignment of the highest expertise level (4/4).

However, the reviews contain significant technical errors, factual misstatements, 
and scope violations that fundamentally undermine their credibility. They also 
demonstrate concerning patterns suggesting sections of the paper went unread.

Rather than engage in a private rebuttal process that lacks accountability, we have 
documented these specific issues publicly:

https://motiwala.com/blog/acm-ics-2026-peer-review-without-accountability

This documentation serves as our formal rebuttal. The evidence clearly demonstrates 
that acceptance based on technical merit is the appropriate outcome. However, we 
leave that decision to ICS 2026.

We believe transparency and public accountability serve the field better than 
closed-door exchanges that shield sloppy reviewers from the consequences of demonstrably 
incorrect claims. The final section of the above document proposes specific accountability 
mechanisms that could prevent such issues in future reviews.

Respectfully,
Yusuf Motiwala

Observations on the State of Physics and AI Research »