Split broad model comparisons into separate condition panels

For grouped-result method comparisons spanning several datasets or evaluation scores, prefer a multi-view structure over a single aggregate ranking to improve insight and mitigate hidden rank reversals for analysts.

purpose:select
basis:empirical
task:compare
scope:grouped-result
structure:multi-view:use
structure:single-view:avoid
quality:insight:use
lever:layout-structure

advice

Condition-separated panels

Split a broad method comparison into separate panels for each dataset and each score instead of collapsing everything into one overall ranking. For example, show one ranking for each image dataset, one for each video dataset, and separate score panels for CC, NSS, and shuffled AUC when the same models change order across conditions.

reason

Why separate panels reveal the real comparison

A single aggregate ranking hides where methods trade places. Separate panels show both the stable leaders and the conditions that change the story.

Mechanism: Condition-separated panels make rank shifts visible, so readers can see whether a method is consistently strong or only strong on a particular dataset or score.

Evidence: The paper finds that model rankings vary across datasets and evaluation scores, and argues that fair evaluation requires comparison over several datasets because dataset statistics differ and one-score comparisons are hard to interpret directly (Borji et al., 2013).

context

Use when one comparison spans many conditions

User Goal: Summarize the state of the art without hiding condition-specific behavior.
Task: Compare many models across several benchmarks or scoring methods.
Data: Multiple image datasets, video datasets, or multiple evaluation scores for the same models.
Chart Setting: Benchmark figures, review tables, or leaderboard graphics.
Audience: Analysts and reviewers assessing relative performance.
Success Criterion: Readers can see both overall strength and condition-dependent reversals.

exceptions

Do not use it when the study is intentionally single-condition

Break it when: The comparison is intentionally restricted to one stimulus family and one task-specific score. Why: The paper uses a single NSS view for synthetic odd-target displays because that one score matches that one evaluation goal.

costs

What this costs

Sacrifice: The figure takes more space and is slower to scan at first glance. Risk: Too many panels can bury the main result. Mitigation: Keep each panel to one condition and reserve any overall summary for after the per-condition views.

mistakes

Common layout failure

Mistake: Collapse several datasets and scores into one average rank. Why it fails: The combined view hides cases where models exchange positions across datasets or across scoring methods.

check

How to decide between one panel and several

Failure Sign: The same model moves noticeably up or down when you switch dataset or score. Quick Check: Compare the top models across conditions before averaging; if the order changes, a single ranking is too coarse. Stronger Test: Build a one-panel aggregate and a condition-separated multi-view; if the headline winner or ordering changes, keep the multi-view.

fix

What to change

Create one panel per dataset instead of averaging datasets into one rank list.
Create one panel per evaluation score instead of mixing scores into one summary ordering.
Separate still-image and video results into distinct sections.
Move any overall average to a secondary summary after the condition-specific panels.

References

Borji, A., Sihite, D. N., & Itti, L. (2013). Quantitative Analysis of Human-Model Agreement in Visual Saliency Modeling: A Comparative Study. IEEE Transactions on Image Processing, 22(1), 55–69. https://doi.org/10.1109/TIP.2012.2210727