Browse Source

Answered several questions, still have a few more to go

master
Ryan Stewart 11 months ago
parent
commit
75dcd4fb23
  1. 40
      project.rmd

40
project.rmd

@ -23,19 +23,28 @@ Questions to answer. Make all pretty pretty after logic.
###1. Do states in the south have a higher crime rate? - Independent t test -S
```{r}
# I did a two sample t test between south and not south to see the difference
s = split(df, df$southern)
northern = data.frame(s[1])
southern = data.frame(s[2])
t.test(x = northern$X0.crime_rate, y = southern$X1.crime_rate)
# I don't think we should have a graph, we should use a statistic
#ggplot(newdf, aes(x=southern, y=crime_rate)) + geom_point() + geom_boxplot() #visualize
#will need to work out how to correctly show boxplot for southern and nonsouthern
```
Here you can see that there isn't a statistically significant difference in means of the crime rate of northern vs southern states. The crime rate is about the same regardless of where you are.
###1. What is the ratio of males to females? - Simple statistics -S
```{r}
mean(df$males)/1000
#suggestion: there is already a section for more males. What could be useful is if we can divide the ones and zeros with their values relating to crime, and then a t.test can be performed easily. Same idea can be held for the southern states question.
```
This is a simple statistic that shows that there are slightly less males than females.
###1. Comparison of education and poverty - Paired t test -S
```{r}
ggplot(df, aes(x=education, y= below_wage)) + geom_point() + theme_minimal()
```
```{r}
ggplot(df, aes(x=education, y= below_wage)) + geom_point() + theme_minimal() + geom_smooth(method=lm)
@ -46,7 +55,7 @@ mdl1=lm(below_wage~education,data=df)
summary(mdl1)
```
Here, it can be seen that as the value of education goes higher, the value of lower wage jobs decreases.
If null hypothesis is that beta1 = 0, showing no correlation, the p-value would be evaluated highly. In this cause, the p-value is low enough to reject that null hypothesis.
If null hypothesis is that $\beta_1$ = 0, showing no correlation, the p-value would be evaluated highly. In this cause, the p-value is low enough to reject that null hypothesis.
(I'm just writing generally here, I will make it better when everything else is done)
```{r}
@ -56,14 +65,14 @@ It should be noted that the assumptions of homoscedasticity and errors of a norm
###1. Is crime reported more often in places that spend more on police? -S
```{r}
ggplot(df, aes(x=expenditure_year0, y= i_crime_rate)) + geom_point() + theme_minimal() + geom_smooth(method=lm)
ggplot(df, aes(x=expenditure_year0, y= crime_rate)) + geom_point() + theme_minimal() + geom_smooth(method=lm)
```
```{r}
mdl2=lm(i_crime_rate~expenditure_year0,data=df)
mdl2=lm(crime_rate~expenditure_year0,data=df)
summary(mdl2)
```
Based on this summary, the value of crime rate should increase by .6283 with each value increase of expenditure on police. If our null hypothesis is beta1 = 0 to indicate no correlation between these two variables, it can be rejected based on the small p-value.
Based on this summary, the value of crime rate should increase by .6283 with each value increase of expenditure on police. If our null hypothesis is $\beta_1$ = 0 to indicate no correlation between these two variables, it can be rejected based on the small p-value.
```{r}
plot(mdl2)
@ -71,8 +80,23 @@ plot(mdl2)
In this experiment, there was an assumed homoscedasticity and errors of a normal distribution. Based on the residual v fitted and Normal Q-Q plots, those assumptions have the right to be questioned, but not as drastically.
1. Can the number of people in poverty predict crime rate? - Linear regression (should also do a predict command with this)
1. What is the average education level of people in the area vs the amount of crime that occurs? - Years of education per crime rate
Here we take a look at the unemployment rate compared to the crime rate. We do not only have the unemployment, so we must take the sum of the young and old (We might want to drop this)
1. How does average education level of people in the area affect the amount of crime that occurs? - Years of education per crime rate
```{r}
ggplot(df, aes(education, crime_rate)) + geom_point() + theme_minimal()
```
As you can see by looking at this graph, there is little to no correlation at all between any of these points. Creating any type of model would be ineffective
1. Can education level predict the amount of crime that occurs? - Linear regression
No
1. Is there more crime from young males compared to any other group?

Loading…
Cancel
Save