Guidelines
Suggest edit

Add subgroup views beside the aggregate scatterplot when trends may reverse

For grouped relationship analysis, use a multi-view subgroup breakdown on scatterplots to improve trust and mitigate the risk that the aggregate trend hides within-group reversals for readers interpreting associations.

  • purpose:refine
  • basis:empirical
  • task:relate
  • scope:grouped-result
  • chart:scatter
  • structure:multi-view:use
  • quality:trust
  • lever:layout-structure

advice

Subgroup breakdown next to the aggregate

Add subgroup views beside the aggregate scatterplot when the data are grouped and the within-group relationship may differ from the overall relationship. For example, keep the combined scatterplot, but also show each group separately so an overall positive correlation is not mistaken for positive within-group trends when each subgroup slopes downward.

reason

Why the subgroup breakdown works

Aggregate patterns can contradict subgroup patterns. Showing both levels lets readers see whether the overall direction is stable or whether combining groups has changed the apparent relationship.

Mechanism: Separate subgroup views expose within-group slopes that are hidden when all points are pooled into one correlation or one combined scatterplot.

Evidence: The paper’s Simpson’s paradox example preserves a positive overall Pearson correlation while arranging the grouped subsets so that each individual subgroup has a negative correlation, demonstrating that aggregate and within-group trends can disagree (Matejka & Fitzmaurice, 2017).

context

When this applies

  • User Goal: Interpret a relationship in grouped data.
  • Task: Compare the overall association with the within-group association.
  • Data: Paired quantitative observations divided into meaningful groups.
  • Chart Setting: The current scatterplot pools all groups into one aggregate view.
  • Audience: Readers may otherwise rely on one overall trend or one reported correlation.
  • Success Criterion: Readers can see whether group-level trends match or reverse the aggregate trend.

exceptions

When not to add subgroup views

Break it when: The data do not contain meaningful groups to compare. Why: There are no within-group trends to contrast against the aggregate view.

costs

Tradeoffs of the subgroup breakdown

Sacrifice: You use more space than a single aggregate scatterplot. Risk: Readers may focus on only one level and miss the other. Mitigation: Keep the aggregate view and subgroup views together.

mistakes

Common failure mode

Mistake: Plot only the combined point cloud when groups are a relevant unit of interpretation. Why it fails: The aggregate direction can hide or reverse the direction present inside the groups.

check

How to test the refinement

Failure Sign: The aggregate correlation is clear, but the within-group direction is unknown. Quick Check: Compare the combined scatterplot with subgroup views; if subgroup slopes differ from the aggregate slope, keep both levels visible. Stronger Test: Inspect the overall correlation and each subgroup correlation side by side before interpreting one pooled trend.

fix

What to change

  • Split the grouped data into subgroup scatterplots alongside the aggregate scatterplot.
  • Distinguish the groups before interpreting a single overall correlation.
  • Report the aggregate relationship only after checking whether subgroup trends agree with it.

References

Matejka, J., & Fitzmaurice, G. (2017). Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 1290–1294. https://doi.org/10.1145/3025453.3025912