Statistics Project-Helmets and Law-Free Essay

Executive Summary

Recently, the government has passed a new law wearing helmets inside the city. This law was a controversial issued for many years but when this was about to acted it failed because the citizens are against the policy. However, due to the dramatically increase in number death causing by traffic accidents, government strictly launch the law this time. Base on that issue, we decided to choose wearing helmets and how other factors can effect on wearing helmets as our project. However, because we do not have enough conditions involving money, time and human, RMIT was chosen as our population. Moreover, as our purpose was to figure out whether the relationship between law and number of people wearing helmets so we only did observation two weeks before and after the law passed to see whether the law will affect on the number of people wearing helmets. In regarding to that, this report will present our data and how we do the analysis to define the possible relationship between wearing helmets and other factors such as law and different times of the day through a chain of Z Test, Confidence Interval for proportions and hypothesis testing. Once this project is finished, it should be able to show what problems may occurs and whether the law is applied effectively. It also can provide ideas for researches on the differences in the number of people died in accident before and after the law were passed so the future direction for traffic accidents will probably be defined.
Introduction
According to Vietnam Net, a local popular online newspaper, in 2006, the number of people died by traffic accidents in Ho Chi Minh has jumped to a warning level with around 1.332 traffic accidents resulted in 1.014 people died mainly by brain injuries. Since 32 Law declared that wearing helmets in highways from 15 September and in all the streets from 15th December is compulsory, it seems to be the best solution for the problem. However, people especially students still prefer not to wear them when driving out. In regarding to this issue, this report will look at RMIT Southern Vietnam International University as a sample in an observation research to analyze how students react to the law and how the law and other factors such as time affect on the number of students wearing helmets. In order to do that, this report will first describe the methodology in collecting information and the related problems by measuring central tendency, variation as well as diagrams such as histogram, Pareto will be used to illustrate the issues. Then, these problems will be analyzed through performing Z test, confidence interval for proportions as well as hypothesis testing to see whether there are any other factors affecting on the effectiveness level of the 32 law as well as defining if there are any possible relationship between wearing helmets and other factors such as times and the law . Finally, proposed solutions to the issue will be given.
Data Collection
This project was done by Heroes Teams including Tam, Chuong, Duy, Khoa and Dung. We want to know whether the number of students wearing helmets after 15th December will be different. At first, we was going to conduct a survey to get what students think about the law so that we can based on their opinions to estimate whether they will obey the law or not. Then guessing how different it would be in comparing to before the law. However, it seems to be that survey research may give us irrelevant and biased data. Therefore, we have planned to conduct an observation research to have an accurate data about the number students wearing helmets before and after 15th December by observing two week before and after the law. From 15 September students just have to wear the helmet on the highways so we decided to count the number of students wearing helmet at the intersection between Nguyen Van Linh Boulevard and RMIT University. Each member in group will stand on the intersection two hours: one hour in the morning and another in the afternoon, to count following the schedule that shown below. At the end of the day, the last person will also be in charge of summary the data collecting in this day. Overall, we have done our task successfully. All of us have tried our best to manage our timetable in order to complete the team’ tasks although we need to stand under the sun for around one hour. In addition, although RMIT students in total are about 3000, only a thousand or more study each day. Consequently, a sample of 2000 students will be observer using the simple random sample with replacement in four week. In average, 100 students, a half in the morning and another half in the afternoon per day up to 4 weeks. The reasons why we decide to use this sampling strategy are because we cannot control who will be observed and students used to have different timetable for each day so using another strategies such as systematic, stratified or cluster or even simple random sample without replacement seem to be impossible. We also have discussed and determined two very important factors that we need to observer carefully such as the number of students wearing helmets as well as whether the way they used to wear the helmets is correct. We want to have accurate figures about these factors to prevent getting irrelevant data. The data collected two weeks before and after the law was provided below:

Week Students Law Date Days Morning Wear Helmets Afternoon Wear Helmets
1 Khoa Before 12/3/2007 Monday 7:00 - 8:00 13 12:00 - 1:00 20
1 Tam Before 12/4/2007 Tuesday 8:00 - 9:00 14 1:00 - 2:00 25
1 Chuong Before 12/5/2007 Wednesday 9:00 - 10:00 20 2:00 - 3:00 21
1 Dung Before 12/6/2007 Thursday 10:00 - 11:00 19 3:00 - 4:00 23
1 Duy Before 12/7/2007 Friday 11:00 - 12:00 23 4:00 - 5:00 27
2 Khoa Before 12/10/2007 Monday 7:00 - 8:00 17 12:00 - 1:00 26
2 Tam Before 12/11/2007 Tuesday 8:00 - 9:00 26 1:00 - 2:00 29
2 Chuong Before 12/12/2007 Wednesday 9:00 - 10:00 31 2:00 - 3:00 30
2 Dung Before 12/13/2007 Thursday 10:00 - 11:00 32 3:00 - 4:00 15
2 Duy Before 12/14/2007 Friday 11:00 - 12:00 35 4:00 - 5:00 32

Week Students Law Date Days Morning Wear Helmets Afternoon Wear Helmets
3 Khoa After 12/17/2007 Monday 7:00 - 8:00 41 12:00 - 1:00 49
3 Tam After 12/18/2007 Tuesday 8:00 - 9:00 45 1:00 - 2:00 48
3 Chuong After 12/19/2007 Wednesday 9:00 - 10:00 47 2:00 - 3:00 48
3 Dung After 12/20/2007 Thursday 10:00 - 11:00 46 3:00 - 4:00 49
3 Duy After 12/21/2007 Friday 11:00 - 12:00 49 4:00 - 5:00 48
4 Khoa After 12/24/2007 Monday 7:00 - 8:00 50 12:00 - 1:00 50
4 Tam After 12/25/2007 Tuesday 8:00 - 9:00 48 1:00 - 2:00 49
4 Chuong After 12/26/2007 Wednesday 9:00 - 10:00 50 2:00 - 3:00 50
4 Dung After 12/27/2007 Thursday 10:00 - 11:00 49 3:00 - 4:00 50
4 Duy After 12/28/2007 Friday 11:00 - 12:00 50 4:00 - 5:00 50

Data summary

Firstly, after two weeks before law we find out that there are many students do not wear helmet while they was driving on Nguyen Van Linh highway. In the first week, there were only 207 over 500 occupied 41.4 % students wearing helmets. One week after, the figures increase slightly to 273. This could have been due to the fact that some students have already prepared for the law in 15th December.

After the law, in the third week, there are 470 over 500 students wearing helmet at the intersection between Nguyen Van Linh Boulevard and RMIT University. The last week, this number has increased by 25 to 495 students. This could be because students have become more familiar with the law. However, there was still a small proportion not wearing helmets such as the students living near Nguyen Van Linh highways.
Clearly, the number of students wearing helmets has increased significantly after 15th December from just 207 students jump up to approximately 495 students. Beside the law, there also some other factors probably affecting on the number of students wearing helmets such as weather, times. For example, in the morning the number of students wears helmet is less than the number of students wears helmet in the afternoon.

The reason might be that they want to protect their skin in the afternoon. Furthermore, the number of times polices guards on Nguyen Van Linh Street in the afternoon also less than in the morning. As it can be seen in the diagrams, the number of students wearing helmets in the morning is just 705 while this figures for the afternoon is 739. In order to have a more clear view about whether the law and times determined the change in the number of students wearing helmets, the following part will discuss in more detail.
Wearing Helmets and Time

Wear Helmets In The Morning Wear Helmets In The Afternoon
N Valid 20 20
Missing 0 0
Mean 35,25 36,95
Std. Error of Mean 3,085 2,901
Median 38,00 40,00
Mode 50 50
Std. Deviation 13,795 12,976
Variance 190,303 168,366
Skewness -,365 -,241
Range 37 35
Minimum 13 15
Maximum 50 50
Sum 705 739

The number of students wearing helmets in both the morning and afternoon has a negative skewness distribution. The reason for that may due to the fact that in the first two week the figures is quite low and then it increase significantly to approximately 500 students. The central tendency including mean, mode and median are quite different between the morning and afternoon. Although, the average number of students wearing helmets in the morning is less than in the afternoon, its standard errors of the mean are higher than in the afternoon.
In addition, the number of wearing helmets students has reached the maximum value of 50 students, the number we observe for each one hour. However, because of the different between data two weeks before and after the law, the standard deviation for both in the morning and afternoon means that the real data is far from their means.

This negative skewness graph above describes each figure we have collected in the morning in frequency. It can be seen that that the number of students wearing helmets has increased in quantity from 13 students to 50 students at the end of week 4.
Also, the total hours we have used to collect the data in the morning was 20 hours. Although the data collected has a mean of 35.25 implying that the average number of students wearing helmets per day is about 35.25. However, its data spread far from the mean with the standard deviation up to 13.795.

In the afternoon, the average number of students wearing helmets is 36.95 with a high standard deviation of 12.976 proving that the data is different between two periods of time. From the minimum value of 15, the figure also reaches a peak of 50 in the last two weeks. It can be understand more clearly through the box and whisker below:









Through the above box and whisker plots, we can see how the data is distributed in the number of students wearing helmets in the morning and the afternoon. Clearly, the distance from the minimum value to the mean is less than from the maximum value. Also, the figure in the afternoon is obviously higher than in the morning.
Effects of the Law
In order to understand more about how the law effects on the number of students wearing helmets, a data analysis has been shown below:

In the first two week, only a few students wearing helmets when they are driving into the intersection of Nguyen Van Linh and RMIT University. Although the maximum value is quite high, the average students wearing helmets per day are just 23 students. Also, the data spread quite far from the mean with a standard deviation of 7.746.
This may be because while some students have already prepared for the law, others still prefer not wearing helmets so the change in number of students was unstable.

The highest and lowest numbers of students wearing helmets in the first two weeks are 35 and 13 respectively. However, after 15th December, this figure has gone up significantly to a new minimum level of 40 and maximum of 50. The standard deviation also decreases sharply to about 2.877 reducing the distances between data and the mean.

Before the law, the average students wearing helmets in Nguyen Van Linh Street was about 25 students, which is higher than in comparing to the figure in the morning. However, the numbers are spread quite far from the mean with the standard deviation up to 5.16 while the standard deviation in the morning is just 2.6. This may be because students used to driving out more at the same period of time. Although the number wearing helmets in the afternoon is quite higher than in the morning, this figure maximum was only 32 while it is 35 in the morning.


Although the standard deviation is quite high before the law, it is only 0.876 after 15th December. In the last two week, the number of students wearing helmets in the afternoon increase significantly with the mean of 49.1 and reach a peak of 50 first time at the end of week 3. The minimum of 32 in the first two weeks has also jumped up to 49.
Obviously, the law has played an essential role in the increasing of the figures. In order to have a more clear view about how the law has effect on the number of students wearing helmets both in the morning and the afternoon, we can also look at the Pareto diagram below:

Overall, in the last two week, the figures has increase nearly double in both the morning and afternoon from just more than 110 to over 240 students in the afternoon wearing helmets and a large increase from 89 to nearly 150 students in the morning.

Finally, another problems need to be mentioned is the way of students wearing helmet. Before the law, no one cares about the way they wear the helmet correctly or not. However, after the law, the police will catch the students who wears helmet incorrectly. For example, the police will charge 50,000 VND for those who do not wear the helmet belt. In some case, some students wears the helmet belt but it still incorrect, compare with the helmet rule. So, it explains why the percentage of students wearing helmet incorrectly is high with over 88% (Vietnam Net). In addition, wearing helmets incorrectly is very dangerous for the drivers because if accident happened, instead of their brain, their neck may be damaged seriously and it can lead to death.
Inferential Statistics



In this part, we decided to use some statistic components to analyze any possible relationship between different variables and the number of people wearing helmets included:

 Using Confidence Interval Estimate for the Proportion to define the quantity of helmets people wearing before and after the law.

 Using Z test for proportion in term number of successes to check whether the proportion of people not wearing helmets is 0.5

 Using Z test for two difference proportion to identify possible relationship between law, times and wearing helmets.

Confidence Interval Estimate for the Proportion

A 95% confidence interval estimate of the population proportion of RMIT students can be used to evaluate the number of students wearing helmets in 1000 RMIT students before and after law separately. There were 478 and 966 students wearing helmets before and after law respectively.

Before law:

Sample Size 1000
Number of Successes 478
Confidence Level 95%

Intermediate Calculations
Sample Proportion 0.478
Z Value -1.95996398
Standard Error of the Proportion 0.015796075
Interval Half Width 0.030959739

Confidence Interval
Interval Lower Limit 0.447040261
Interval Upper Limit 0.508959739
We have 0.447≤p≤0.508. Therefore, there is 95% confidence that between 44.7% and 50.8% of 1000 RMIT students wearing helmets before law.

After law:


Sample Size 1000
Number of Successes 966
Confidence Level 95%

Intermediate Calculations
Sample Proportion 0.966
Z Value -1.95996398
Standard Error of the Proportion 0.005730969
Interval Half Width 0.011232492

Confidence Interval
Interval Lower Limit 0.954767508
Interval Upper Limit 0.977232492


We have 0.954≤p≤0.977. Therefore, there is 95% confidence that between 95.4% and 97.7% of 1000 RMIT students wearing helmets before law. This is much higher comparing to the figure before.
Base on those statistics of our sample we can 95% confidently estimate before a new law about wearing helmets was passed there were only 44.704% - 50.508% of RMIT students wear helmets. However, when the government really acted this law, the percentage of RMIT students wearing helmets increased significantly to 95.476% - 97.723%.

Z Test for the Proportion In Terms of the Number of Successes


Our assumption from observing the data is that the number of students wearing helmets before law is equal to the number of students not wearing one. In order to know whether it is correct or not, we decide to use the Z test for the proportion in terms of the number of students obey the helmets law when driving on Nguyen Van Linh Boulevard. In 1000 students we have observed in the first two week, there are 478 students wearing helmets and 522 do not.

In terms of proportions, the null and alternative hypotheses are stated as follows:
Ho: p=0.5 that is the proportion of students wearing helmets was 0.5
H1:p1#0.5 that is the proportion of students do not wearing helmets before law was 0.5

Because we want to know whether our assumption is right, a two-tail test is used with a level of significant α of 0.05. The decision rule is
Reject Ho if Z < -1.96 or if Z > +1.96
Otherwise do not reject Ho

Null Hypothesis p= 0.5
Level of Significance 0.05
Number of Successes 478
Sample Size 1000

Intermediate Calculations
Sample Proportion 0.478
Standard Error 0.015811388
Z Test Statistic -1.39140217

Two-Tail Test
Lower Critical Value -1.959963985
Upper Critical value 1.959963985
p-Value 0.164103507
Do not reject the null hypothesis



Because Z= -1.39 > -1.96, we do not reject Ho. Thus, there is evidence that the proportion of students wearing helmets is 0.5. That mean before the law, the number of people wearing helmets is just about equal to the number of people do not wear one. However, after the law was passed, there is a significant change in this figure with 966, almost double the number of people wearing helmets before law.
Z Test for Two Proportions

In evaluating differences between two proportions on the basic of two samples, that is 1000 RMIT students before and after law as well as 1000 students in the morning and afternoon, we decided to use Z test for different between two proportions. This Z test mainly will be used to determine whether the number of students wearing helmets after the law will be higher than before and if the figures in the afternoon is higher than in the morning. Therefore, we used lower-tailed test.
Law and Wearing Helmets

The null and alternative hypotheses are
Ho: p1≥p2 that is the number of students wearing helmets before law is greater than the figures after the law.
H1: p1Since the test is to be carried out at the 0.05 level of significance, the critical values are -1.96 and +1.96.
The decision rule is:
Reject Ho if Z < -1.96
Or if Z > +1.96
Otherwise do not reject Ho.
Hypothesized Difference 0
Level of Significance 0.05
Before law
Number of Successes 478
Sample Size 1000
After law
Number of Successes 966
Sample Size 1000

Intermediate Calculations
Group 1 Proportion 0.478
Group 2 Proportion 0.966
Difference in Two Proportions -0.488
Average Proportion 0.722
Z Test Statistic -24.35644092

Lower-Tail Test
Lower Critical Value -1.644853627
p-Value 2.4772E-131
Reject the null hypothesis
From the result, using the 0.05 level of significance, we see that Z test statistic is too small compare with lower critical value (-24.356 << -1.644) therefore we reject the null hypothesis Ho. Hence, it can be concluded that the law has effect and lead to an increase in the number of students wearing helmets. This explained why the figures has increased significantly from a low level of 478 in the first two week to approximately 966 students wearing helmets at the end of week 4.
Times and Wearing Helmets


The null and alternative hypotheses are
Ho: p1=p2 that is the number of students wearing helmets in the morning is equal to the figure in the afternoon
H1: p1#p2 that means conversely.
Since the test is to be carried out at the 0.05 level of significance, the critical values are -1.96 and +1.96.

The decision rule is:
Reject Ho if Z < -1.96
Or if Z > +1.96
Otherwise do not reject Ho.
Hypothesized Difference 0
Level of Significance 0.05
Morning
Number of Successes 705
Sample Size 1000
Afternoon
Number of Successes 739
Sample Size 1000

Intermediate Calculations
Group 1 Proportion 0.705
Group 2 Proportion 0.739
Difference in Two Proportions -0.034
Average Proportion 0.722
Z Test Statistic -1.696965146

Two-Tail Test
Lower Critical Value -1.959963985
Upper Critical Value 1.959963985
p-Value 0.08970325
Do not reject the null hypothesis

After doing the test, using the 0.05 level of significance we see that Z test Statistic (-1.696) is larger than the lower critical value (+-1.959) and smaller than the upper critical value which means the null hypothesis is not rejected. In brief, there is evidence that time does not effect on the number of students wearing helmets.
After performing different analysis, we realized that although the number of people wearing helmets before 15th December is just equal to others, this number changed significantly after the law was passed. To illustrate, through confidence interval for the proportion, it can be seen that there is 95% confidence that between 44.7% and 50.8% of 1000 RMIT students wearing helmets before law. However, this figure has jumped up to 95.476% - 97.723% after the law.
Reflection

Achievement


When doing our report, we have made some decisions effecting positively on the process of collecting data:

 The sample size is just enough for us to observer with an average of 50 students that one members need to count each hours so we have finished our tasks successfully. In addition, the sample is large so its sampling error is very low.

 The sample proportion for each week is equal and measured scientifically, that is 500 students were observed.
General Problems



Overall, there are some mistakes that we have met during doing this project. The most important mistake is choosing observation as the mean of Data collection. The reason for why we call it was an error because base on our observation there were too little information and problems that we can analysis in the Inferential Statistics part. Therefore, this part only takes a small place in our report and contains only some problems even though this is the highest score session.

Beside, the most serious problem was that we have chosen wrong sample for our original purpose. In particular, we have seen a relationship between the number of accidents and the law. However, because our sample and population is RMIT students while the statistics of accidents and deaths are for the Ho Chi Minh City. As a result, we cannot make a conclusion about this relationship. In addition, we have thought narrowly when think that the data for accidents and deaths are easy to find. In contract, these kinds of data update really slowly and are mainly put in some articles instead of a report. Therefore, we cannot find the data for the number of accidents after 15th December to put into the inferential part.


Sampling error



Although we have taken a sample of about 2000 students two weeks before and after 15th December, sampling error is unavoidable because we do not take all the population because it is costly, time-consuming and ineffectively. Nevertheless, our sample has very small sampling error because we have decided to choose up to two over three of all RMIT students.

Sampling bias


Since we only observe from 7am to 5pm and do not observe in Saturday so there might be some missing data in the other time especially in the evening. Because, we do not have enough condition to observe at these time so it is also unavoidable. In addition, some time students come to school as a large group so it is quite difficult for one person to count all. Therefore, there is a chance one person may be counted more than two times while the others were not be counted. However, we have developed a scientific schedule for all members and learn from our past experiences in order to reduce these errors to the minimum level.
Lurking Variables


 Location: some students live near RMIT University so they sometimes do not wear helmets when driving to school.
 Outside students: sometime students from outside or from high schools pupils coming to visit RMIT University and sometime we cannot define who is the students that we need to count. Therefore, the number of students wearing helmets may be over counted.
 Hypothesis-Testing error: using a sample statistic to make a decision about a population parameter may lead to type I and type II errors.






Recommendations

As we have mentioned above that our observation did not support for our original purpose and therefore, a number of other problems also occurred during project. We will provide some basic ideas that might help to improve it below:

 Firstly, to accomplish our objectives, we should choose another place as well as a larger sample to do observation like a specific street so that we can get data about not only whether law and times effecting on the number of people wearing helmets but also the number of accidents and deaths could be found through asking or giving to students the survey included some questions like: “Have you made any accidents” and “If yes, when did you make that?”, for example.

 Secondly for those remaining error, we suggest that enlarging our observation schedule including Saturday and observe in the evening probably help us cover all kind of RMIT students so they may have a same chance to be observed. Furthermore, we should also ask them whether they are RMIT students or not before counting them.

 Finally, we should attempt to control the lurking variable in order to get more accurate data about the situation.
Conclusion




After four weeks observing at the intersection between Nguyen Van Linh highways and RMIT University, we have organized and analyzed the data having from observation and concluded that there are differences in the number of people wearing helmet effecting by other factors such as times of the day and law. After using hypothesis testing, Z test for proportion and confidence interval to check and determine on how law and times influence on the number of people wearing helmets, we found strong evidence that the number of people wearing helmets after the law is also increase significantly comparing to before. However, it was also pointed out that the different times of the day do not effect on the number of people wearing helmets. Nevertheless, it cannot be concluded without a controlled experiment because the results might also be due to other factors such as the sampling error, selection bias. Besides, it was a pity for us because we cannot perform the test for the number of people death and injuries or whether students wearing helmets correctly or not because there were not any up-to-date data from 15th December.

References


 Levine, D, Stephan, D, Krehbiel, T & Berenson, M., 2005, Statistics for Managers: Using Microsoft Excel, 4th edn, Pearson Education Inc., Upper Saddle River, New Jersey.

 Vietnam Net, “TPHCM: Mỗi ngày hơn 50 người chết và bị thương vì TNGT”, viewed 23th December 2007,
< http://vietnamnet.vn/baylenvietnam/giaothong/2007/04/687470/ >

 Vietnam Net, “Đội mũ bảo hiểm sai cách, nhiều người nhập viện cấp cứu”, viewed 30th December 2007,
< http://vietnamnet.vn/xahoi/2007/12/761406/ >