Make shuffled AUC the primary comparison panel

For grouped-result comparisons of fixation-prediction models, use shuffled-AUC-first layout on multi-view benchmark panels to improve fidelity and mitigate center-bias and border-driven misreading for analysts.

purpose:refine
basis:empirical
scope:grouped-result
structure:multi-view
quality:fidelity:use
lever:layout-structure
communication:credibility

advice

Score-panel priority

Make the shuffled-AUC panel the primary readout when showing fixation-prediction model rankings. For example, rank models by shuffled AUC, place CC and NSS in secondary panels, and treat a strong centered-Gaussian score under CC or NSS as evidence of center bias rather than model quality.

reason

Why shuffled AUC should lead

Center-biased fixation datasets can make a weak center-seeking method look strong under some scores. A primary shuffled-AUC view keeps attention on off-center fixations that are harder and more informative to predict.

Mechanism: Shuffled AUC discounts center bias by using fixations from other images as negatives, so the headline ranking is less driven by central fixation density and border artifacts.

Evidence: The paper reports that CC and NSS are sensitive to center preference, that shuffled AUC tackles center bias and border effects, and that shuffled AUC is the best option for saliency-model comparison on fixation datasets (Borji et al., 2013).

Notes: The paper still reports CC and NSS, but not as the preferred basis for judging model quality.

context

Use when fixation benchmarks are center-biased

User Goal: Fairly compare fixation-prediction methods.
Task: Rank models on natural image or video eye-tracking benchmarks.
Data: Human fixation datasets with strong center clustering.
Chart Setting: Multi-panel benchmark figures or leaderboards with several evaluation scores.
Audience: Analysts and reviewers comparing model quality.
Success Criterion: The headline ranking reflects non-trivial fixation prediction rather than dataset bias.

exceptions

Do not use it as the primary readout for single-target search arrays

Break it when: The comparison is limited to synthetic search patterns with one tagged target location and the goal is pure target detection. Why: The paper uses NSS there because a single target location makes target-hit strength the relevant score.

costs

What this costs

Sacrifice: The primary panel becomes less comparable to older results that emphasized CC or NSS. Risk: Readers may ignore secondary score panels they already know. Mitigation: Keep CC and NSS visible as secondary panels, but make the ranking and takeaway come from shuffled AUC.

mistakes

Common score-panel failure

Mistake: Use CC or NSS as the headline ranking on center-biased fixation datasets. Why it fails: A centered baseline can score well there, so the chart overstates true fixation-prediction quality.

check

How to test the panel order

Failure Sign: A centered-Gaussian baseline sits near the top of the headline ranking. Quick Check: Re-rank the same models by shuffled AUC and see whether the order changes materially. Stronger Test: Compare the headline conclusion from CC or NSS against shuffled AUC; if the main winners change, the score order is hiding center-bias effects.

fix

What to change

Promote the shuffled-AUC ranking to the top or leftmost panel.
Move CC and NSS into secondary panels in the same figure.
Reword the main takeaway to cite shuffled AUC first.
If CC or NSS stay prominent, add an explicit centered-Gaussian baseline to show how much center bias they reward.

References

Borji, A., Sihite, D. N., & Itti, L. (2013). Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study. IEEE Transactions on Image Processing, 22(1), 55–69. https://doi.org/10.1109/TIP.2012.2210727