I have a set of data, y and x. I would like to test the following hypothesis: There is a peak in y; that is as x increases, y first increases and then decreases.
My first idea was fitting x and x2 in a SLR. That is, if I find that the coefficient before x is significantly positive and the coefficient before x2 is significantly negative, then I have support for the hypothesis. However, this only checks for one type of relationship (quadratic) and may not necessarily capture the existence of the peak.
Then I thought of finding b, such a region of (sorted values of) x, that b is between a and c, two other regions of x that contain at least as many points as b, and that ¯yb>¯ya and ¯yb>¯yc significantly. If the hypothesis is true, we should expect many such regions b. Thus, if the number of b is sufficiently large, there should be support for the hypothesis.
Do you think I am on the right track to find a suitable test for my hypothesis? Or am I inventing the wheel and there is an established method for this problem? I will greatly appreciate your input.
UPDATE. My dependent variable y is count (non-negative integer).
I was thinking of the smoothing idea also. But there is a whole area called response surface methodology that searches for peaks in noisy data (it does primarily involve using local quadratic fits to the data) and there was a famous paper I recall with “Bump hunting” in the title. Here are some links to books on response surface methodology. Ray Myer’s books are particularly well-written. I will try to find the bump hunting paper.
Although not the article I was looking for, here is a very relevant article by Jerry Friedman and Nick Fisher that deals with these ideas applied to high-dimensional data.
Here is an article with some online comments.
So I hope you at least appreciate my response. I think your ideas are good and on the right track but yes I do think you might be reinventing the wheel and I hope you and others will look at these excellent references.