# Why is Ordinary Least Squares performing better than Poisson regression?

I’m trying to fit a regression to explain the number of homicides in each district of a city.
Although I know that my data follows a Poisson distribution, I tried to fit an OLS like this:

$log(y+1) = \alpha + \beta X + \epsilon$

Then, I also tried (of course!) a Poisson regression. The problem is that I have better results in the OLS regression: the pseudo-$R^2$ is higher (0.71 vs 0.57) and the RMSE as well (3.8 vs 8.88. Standardized to have the same unit).

Why? Is it normal? What’s wrong on using the OLS no matter what the distribution of the data is?

edit
Following the suggestions of kjetil b halvorsen and others, I fitted the data through two models: OLS and Negative Binomial GLM (NB). I started with all the features I have, then I recursively removed one by one the features which were not significant.
OLS is

$\sqrt{\frac{crime}{area}} = \alpha + \beta X + \epsilon$

with weights = $area$.

summary(w <- lm(sqrt(num/area) ~  RNR_nres_non_daily + RNR_nres_daily + hType_mix_std + area_filtr + num_community_places+ num_intersect + pop_rat_num + employed + emp_rat_pop + nden_daily + nden_non_daily+ bld_rat_area + bor_rat_area + mdist_highways+ mdist_parks, data=p, weights=area))

error2 <- p$num - (predict(w, newdata=p[,-1:-2], type="response")**2)*p$area

rmse(error2)
[1] 80.64783


The NB predicts the number of crime, with the district’s area as offset.

summary(m3 <- glm.nb(num ~  LUM5_single  + RNR_nres + mdist_daily + mdist_non_daily+ hType_mix_std + ratio_daily_nondaily_area + area_filtr + num_community_places  + employed  + nden_daily + nden_non_daily+ bld_rat_area + bor_rat_area + mdist_smallparks + mdist_highways+ mdist_parks + offset(log(area)), data=p, maxit = 1000))

error <- p\$num - predict(m3, newdata=p[,-1:-2], type="response")

rmse(error)
[1] 121.8714


OLS residuals:

NB residuals

So the RMSE is lower in the OLS but it seems that the residuals are not so Normal….