I’m working on a project that involves 14 variables and 345,000 observations for housing data (things like year built, square footage, price sold, county of residence, etc). I’m concerned with trying to find good graphical techniques and R libraries that contain nice plotting techniques.
I’m already seeing what in ggplot and lattice will work nicely, and I’m thinking of doing violin plots for some of my numerical variables.
What other packages would people recommend for displaying a large amount of either numerical or factor-typed variables in a clear, polished, and most importantly, succinct manner?
The best “graph” is so obvious nobody has mentioned it yet: make maps. Housing data depend fundamentally on spatial location (according to the old saw about real estate), so the very first thing to be done is to make a clear detailed map of each variable. To do this well with a third of a million points really requires an industrial-strength GIS, which can make short work of the process. After that it makes sense to go on and make probability plots and boxplots to explore univariate distributions, and to plot scatterplot matrices and wandering schematic boxplots, etc, to explore dependencies–but the maps will immediately suggest what to explore, how to model the data relationships, and how to break up the data geographically into meaningful subsets.