In linear regression, Y=Xβ, why is X called the design matrix? Can X be designed or constructed arbitrarily to some degree as in art?
To give an example in line with @neverKnowsBest’s response, consider that in a 23 factorial experiment there are 3 factors, each treated as categorical variables with 2 levels, and each possible combination of the factor levels is tested within each replication. If the experiment were only administered once (no replication) this design would require 23=8 runs. The runs can be described by the following 8×3 matrix:
where the rows represent the runs and the columns represent the levels of the factors:
(The first column represents the level of factor A, the second column B, and the third column C). This is referred to as the Design Matrix because it describes the design of the experiment. The first run is collected at the ‘low’ level of all of the factors, the second run is collected at the ‘high’ level of factor A and the ‘low’ levels of factors B and C, and so on.
This is contrasted with the model matrix, which if you were evaluating main effects and all possible interactions for the experiment discussed in this post would look like:
where the columns represent independent variables:
Although the two matrices are related the design matrix describes how data is collected, while the model matrix is used in analyzing the results of the experiment.
Montgomery, D. (2009). Design and Analysis of Experiments, 7th Edition. John Wiley & Sons Inc.