I have quite a complicated data set to analyze, and I cant find a good solution for it.

Here is the thing:

1.the raw data is essentially insect song recordings. Each song is made of several bursts, and each burst made of sub-units. All individuals have been recorded for 5 minutes. The number of bursts and their position in the recording can be very different between individuals, as well as the number of sub-units per burst.

2.I have the carrier frequency (fundamental frequency) of each sub-unit, and that’s what I want to analyze.

My problems:

1.The frequencies within a burst are not independent obviously (although it’s pretty stable, but the frequency of the sub-unit n-1 will have an influence on the sub-unit n).

2.The bursts are also not independent, within a recording.

3.They are even less independent as the frequency drops with time (the individual gets tired of singing so the frequency of the song gets lower and lower). The droppingseemsto be linear.

4.Nesting = I have 3 replicated populations for two locations A and B. So I have A1, A2, A3 & B1, B2, B3.

What I would like to do:

1.Characterize the difference in frequency between my two locations (test it statistically)

2.Characterize the frequency dropping between the two locations (see if it drops faster in one of them)

How to do it:Well that’s why I need help: I don’t know. It seems that my case combines problems that are usually not seen together. I’ve read about mixed models, about GAM, about ARIMA, random and fixed effects, but I cant be really sure of the best way to do it.

When I graph it though (frequency ~ sub-unit numbern), the difference is very clear between the two locations. I also have to take other variables into account, like the temperature (makes the frequency higher), etc.I thought about:

Nesting the individuals within the replicate their are from, and nest the replicate within the location (individual/replicate/location).

Use a random ‘burst’ effect, so I take into account the variability within each burst.

Use a fixed ‘burst position in recording’ effect, to measure the frequency dropping (hoping it is actually linear).

Would it be correct?

Is there a special type of model I could use for this kind of scenario?

**Answer**

This is just some general suggestions you may find helpful, more a roadmap than a recipe.

- My instinct would be to build a Bayesian hierarchical model, because it lends itself to iterative model development – I don’t think you’ll find an existing model which has all the bells and whistles you’re after. But this makes hypothesis testing harder, I don’t know how necessary hypothesis testing is for you.
- It sounds like you’ve got a little informal model in your head about how the insects behave; you say things like “getting tired” and you know that the temperature makes the frequency higher, presumably because the animal has more energy. It sounds like you’ve got a little generative model in your mind about how the insects make their songs.
- The problem sounds way too complex to model “in one shot”. I think you’ll have to build something up piecemeal. I would start with some “strong simplying assumptions” – i.e., throw away most of the complexity of the dataset, with a plan to add it back in later once you’ve got a simple model which works.

So to begin, I would do something like preprocess the sub-unit frequencies on a burst-by-burst basis into something like a (mean frequency,frequency trend) pair – do this with OLS, and just model the frequency mean and trend of a burst rather then the sub-units themselves. Or you could do (mean,trend,# of sub-units), if the number of subunits relates to how tired the insect is getting. Then build a Bayesian hierarchical model where the distribution of mean and trend of a burst is determined by the mean,trend of the recording, and this in turn is determined by the mean,trend of the location.

Then add temperature in as a factor for the recording mean/trend.

This simple model should allow you to to see the mean and trend of the individual bursts in a recording as determined by the temperature and the location. Try and get this to work.

Then I would try to estimate the difference between mean frequency of the bursts (or trend, by dividing over the quiet time between bursts) by adding this as a variable determined by the location and recording. The next step is an AR model of the burst mean within a recording.

Given some priors and some very strong assumptions about the nature of bursts (that all info is given by mean and trend), this basic model will tell you:

- how is the mean frequency of a burst different location by location and temp by temp
- how is the within-burst trend different location by location and temp by temp
- how is the outside-burst trend different location by location and temp by temp

Once you’ve got something like this to work then it might be time to model the sub-units themselves and throw away the original OLS estimate. I’d look at the data at this point to get an idea of what kind of time-series model might fit, and model the parameters of the time-series model rather than (mean,trend) pairs.

**Attribution***Source : Link , Question Author : Joe , Answer Author : Patrick Caldon*