Guidelines
Suggest edit

Use color rather than shape to encode groups for scatterplot mean comparisons

For comparing group averages, prefer color encoding over shape encoding for group membership on scatterplots to improve fidelity and mitigate misreading of mean position for overview readers.

  • purpose:refine
  • basis:empirical
  • task:compare
  • chart:scatter
  • quality:fidelity
  • lever:encoding
  • channel:color-hue:use
  • channel:shape:avoid

advice

Group encoding for mean comparison

Encode scatterplot groups with color when readers must compare their average position. For example, use two colors rather than circles versus triangles when the task is to judge which class sits higher on average.

reason

Why color helps class-average judgments

Readers first need to isolate the members of each class before they can estimate each class mean.

Mechanism: Color supports faster class selection in a scatterplot, which improves ensemble judgments about average position across the selected members.

Evidence: The paper reports a multiclass scatterplot study in which viewers compared mean position more accurately when groups were distinguished by color than when they were distinguished by shape; adding redundant color-plus-shape cues did not improve performance (Szafir et al., 2016).

Notes: The same study also found that adding more distractor classes did not impair these mean judgments in the way classic visual-search results might suggest.

context

Use when marker appearance carries group identity

  • User Goal: Compare which group has the higher or lower average position.
  • Task: Estimate or compare mean position between plotted classes.
  • Data: Quantitative x/y values plus categorical group membership.
  • Chart Setting: A scatterplot distinguishes classes by marker appearance.
  • Success Criterion: Readers can accurately compare group means without tracing individual points one by one.

exceptions

Do not use outside this encoding choice

Break it when: Group identity is not being encoded by marker appearance in the scatterplot. Why: The reported advantage is specifically a choice between color and shape markers for mean-position judgments.

costs

Tradeoffs of color group encoding

Sacrifice: You should not expect extra accuracy from adding redundant shape coding just for this task.
Risk: Relying on shape alone lowers accuracy for mean-position comparison.
Mitigation: Keep color as the primary group code when average position is the intended readout.

mistakes

Common failure mode

Mistake: Switching groups from color to shape in a scatterplot that is meant to support mean comparison. Why it fails: Shape coding performed worse than color for this specific judgment.

check

Check the group encoding choice

Failure Sign: Reviewers struggle to say which class is higher on average even though both classes are visible.
Quick Check: Create color-coded and shape-coded versions of the same scatterplot and ask the same mean-position question on both; keep the color-coded version if answers are cleaner.
Stronger Test: Compare accuracy on representative mean-position questions before and after the encoding change.

fix

Fix the encoding

  • Replace shape-only group coding with color coding for the compared classes.
  • Drop redundant shape coding if it was added only to improve this mean-comparison task.
  • Keep shape from becoming the sole group identifier when the chart’s job is comparing class averages.

References

Szafir, D. A., Haroz, S., Gleicher, M., & Franconeri, S. (2016). Four types of ensemble coding in data visualizations. Journal of Vision, 16(5), 11. https://doi.org/10.1167/16.5.11