Alternative to chi-square in evaluating the similarity of two distributions (ordered categorical variables)

In my study I compare Finnish and Russian expressions for different parts of the day. I conducted a survey and asked people to refer to a time interval with some non-numeric expression (e.g. if something happens between 1pm and 3pm you might refer to it with the phrase “afternoon”). The thing is that Russian, unlike … Read more

How to normalize the distance between two distributions

I’m creating a distance metric that is composed out of multiple pairwise feature distances. The distance metric will be used in a clustering algorithm for a computer security problem, more specifically the clustering algorithm will group together related “malicious instances”. Our hypothesis is that malicious instances will share similar characteristics (=features), as opposed to benign … Read more

Not able to understand KL decomposition

The bias-variance decomposition usually applies to regression data. We would like to obtain similar decomposition for classification, when the prediction is given as a probability distribution over C classes. Let P=[P1,…,PC] be the ground truth class distribution associated to a particular input pattern. Assume the random estimator of class probabilities ˉP=[ˉP1,…,ˉPC] for the same input … Read more

Sentence sampling based on frequency

I have a database with 300k+ Russian sentences and their English translation. My goal is to use these sentences as flashcards, so the users can learn the top N most frequent Russian words (let’s assume N = 10k). A requirement is that the easiest sentences are shown first, and more complex sentences get slowly introduced … Read more

How to compute distribution conditional on two random variables

I hope that this question is appropriate for this site; I have tried Math.SE, but haven’t had much luck so far. I am dealing with the following scenario: A fair coin is tossed, and we record the result “heads” or “tails” as random variable $Z$. If a head is observed, a random sample $X_1,\dots,X_n \sim … Read more

Are there any quantitative metrics for how representative a sample is?

I’m interested in selecting a sample that is representative of a population. Additionally, I want to be able to quantitatively measure the representativeness of a sample. For example, is there a way to determine, for a sample size n, where n is some fraction of the population size x, how representative the sample is? Can … Read more

How to predict parameters of time-variant distributions?

I am trying to create a simulation, and I have my probability distribution function changing over time (getting more skewed, etc.) We can have up to 5 consecutive years. Can you please explain where to look at. What I do is that I first use EasyFit link then I see the changes in the parameters … Read more

Distribution of $\frac{X_{1}}{X_{1}^{2}+X_{2}^{2}}$, where $(X_1,X_2)$ is bivariate normal?

What is the distribution of $\frac{X_{1}}{X_{1}^{2}+X_{2}^{2}}$ when $(X_1,X_2)$ has a bivariate normal distribution? Answer AttributionSource : Link , Question Author : Jingjings , Answer Author : Community

Compound Binomial distribution [closed]

Closed. This question needs details or clarity. It is not currently accepting answers. Want to improve this question? Add details and clarify the problem by editing this post. Closed 3 years ago. Improve this question I have come across the following expression: $$ T={m-y \choose \frac{m}{2}} \dfrac{1}{2}^{(m-y)}={m-y \choose \frac{m}{2}} \dfrac{1}{2}^{\frac{m}{2}}\dfrac{1}{2}^{\frac{m}{2}-y}$$ where $y$ is a binomial … Read more