Browse Source

Add conclusion and description of the data

Ryan Stewart 11 months ago
  1. 138
  2. 32

File diff suppressed because it is too large
View File


@ -6,15 +6,19 @@ output: html_document
# Introduction
Crime is an issue that has always plagued the world. Many families have been affected by it, and politicians are often asked to solve various issues with crime and reduce it. As with many vague and reoccurring topics, there have been speculations made from stereotypes and the question is asked to how it comes to be. In this data set provided (SHOULD WE TELL WHERE IT IS FROM OR EVEN PUT A CITATION?) some stereotypes will be tested for their significance and theories to causes of education will be looked into using the R programming language.
### Import libraries
Crime is an issue that has always plagued the world. Many families have been affected by it, and politicians are often asked to solve various issues related to crime. Not to mention other issues like young poverty, poor education or high unemployment. As statisticians, it is important to be able to take in data and be able to create a useful analysis of it in order to answer big questions like these. Although this data set used is not with real data, it provides a realistic scenario to analyze different situations relevant a summary of different rates on a state by state basis.
#### Import libraries
### Read in the data
Read in the data
df = read.csv("crime.csv") %>% clean_names()
@ -22,6 +26,28 @@ summary(df)
# Analysis of the dataset
Before we ask any questions about the data, we must understand what all of the columns in the data set mean. Below is a description of each column in the data set.
| Column | Description | Type
| :--- | :---: | ---: |
| crime_rate | The crime rate in number of offenses per million population | Continuous
| youth | Young males ages 18-24 per 1000 | Discrete
| southern | Whether it's a southern state or not. 1 = yes, 0 = no | Boolean
| education | Average number of years spent schooling | Discrete
| expenditure_year0 | Expenditure (per capita expenditure on police) | Continuous
| labour_force | Youth labor force (males employed 18-24 per 1000) | Discrete
| males | males (per 1000 females) | Discrete
| more_males | If males > 1000 per 1000 females then 1, otherwise 0 | Boolean
| state_size | State size in hundreds of thousands | Discrete
| youth_unemployment | Number of males ages 18-24 per 1000 | Discrete
| mature_unemployment | Number of males ages 35-39 per 1000 | Discrete
| high_youth_unemployment | High Youth Unemployment 1 = yes, 0 = no | Boolean
| wage | Wage (median weekly salary) | Continuous
| below_wage | Below Wage (number of families below half wage per 1000) | Discrete
Although it did not get used in this analysis, there are the same columns with `10` appended to the end of it to signify data 10 years later.
## Do states in the south have a higher crime rate?
This is analyzed by using a two sample t.test with and alpha of 0.95 between the states deemed as southern United States and northern United States. Variance was checked between data sets of the states to set the t.test to a more accurate command.