I have data on how many users post how many questions. For example,

`[UserCount, QuestionCount] [2, 100] [9, 10] [3, 80] ... ...`

This means that 2 users each posted 100 questions, 9 users each posted 10 questions, and so on. So, how can I determine whether the

`UserCount, QuestionCount`

distribution follows a power law?I found the poweRlaw package. However, I can only pass one group of numbers to do the evaluation. (The example provided in this package is word frequency.) So how do I use this package? Or do I have something wrong? I also have the data of each user’s question count, i.e.,

`[100, 100, 10, 10, 10 ... ]`

. If I pass this data to the package, what I will get?

**Answer**

According to Clauset et al., this is how you test the power law tail with `poweRlaw`

package:

- Construct the power law distribution object. In this case, your data is discrete, so use the discrete version of the class

`data <- c(100, 100, 10, 10, 10 ...) data_pl <- displ$new(data)`

- Estimate the $x_{min}$ and the exponent $\alpha$ of the power law, and assign them to the power law object

`est <- estimate_xmin(data_pl) data_pl$xmin <- est$xmin data_pl$pars <- est$pars`

the last two line can be rewritten as one line

```
data_pl$xmin <- est
```

Also, at this point, you can see the KS statistic:

```
est$KS
```

- KS statistic tells you how well power law distribution fits your data, but it doesn’t tell you how likely your data is drawn from power law. So you also need a $p$ value. This is how you do it:

`bs <- bootstrap_p(data_pl) bs$p`

This could take some time, so go and grab a cup of tea…

- Assuming you get a $p$ value and it’s greater than 0.05 or whatever your significant level is, you still need to exclude the possibility that no other alternative distribution fits the data better than power law. The
`poweRlaw`

package implements 3 other alternatives that you can compare with. Take log-normal for example:

`data_alt <- dislnorm$new(data) data_alt$xmin <- est$xmin data_alt$pars <- estimate_pars(data_alt) comp <- compare_distributions(data_pl, data_alt)`

Note that the $x_{min}$ of log-normal distribution is set to that of power law, because `compare_distributions`

function require the $x_{min}$s to be the same for both distributions. The `comp`

object has two fields that’s interesting: `comp$test_statistic`

indicates which one is a better fit, with positive number meaning `data_pl`

is better, and negative otherwise; `comp$p_two_side`

meaning how significant is the difference.

Repeat this step with `disexp`

, `dispois`

classes to compare power law with those alternatives.

**Attribution***Source : Link , Question Author : tThirday , Answer Author : Zebra Propulsion Lab*