How to find a suitable association of color with data value in a visualization?

I’m working on a software project that involves creating a visualizer for flood simulations. As part of this project, I’ve created a water gradient that shows water depth at particular points. To set what values will represent what colors, I go through the data and get the minimum and maximum values that occur and evenly distribute the colors according to that scale.

However, there are often times points in these simulations that have significantly deeper water at them than anywhere else in the simulation. This causes most of the points on the map to have very similar colors and this is not very informative and makes the areas where the water is deeper very hard to see.

My goal is to dedicate a larger range of colors to depths that occur more frequently. For example, if depths go from 0 to 12 but most depths are between 1 and 2, I want more color variation to occur within that range than does between say 11 and 12 or 4 and 5. It seems I need to use standard deviation or something involving normal distribution to do this, but I’m a bit fuzzy on how these things work and how I can use them to accomplish my goal.

Any help that can be provided will be appreciated. Thank you.

Answer

It sounds like you might want to dedicate each color in your palette to approximately the same amount of data.

To illustrate, here is a histogram of a set of 110 simulated depth readings:

Histogram

Imagine this were smoothed out. In so doing, the histogram could be evenly sliced into vertical segments of equal area, using as many slices as you like (I used 10 pieces for this example.) To keep the areas equal, the slices have to be skinny where the histogram is high–that is, where there are lots of data–and fat where the histogram is low–that is, where there is little data.

Kernel density, sliced

One way to accomplish the slicing easily is to plot the total amount of data (“cumulative proportion”) against the depth. Slice the vertical axis into even intervals, then read the depths where the slices cross the plot: use those as the cutpoints for visualizing depths.

CDF

The algorithm for computing the cutpoints from the data should be obvious and is simple to write in almost any programming language: sort the values, break the list into groups of approximately equal size, and choose cutpoints to separate the largest value in each group from the smallest value in the group that succeeds it.

Attribution
Source : Link , Question Author : SethGunnells , Answer Author : whuber

Leave a Comment