Introduction

Anyone who has been to New York City has probably taken a ride in a taxi cab before. Due to the traffic of the city, taxis are one of the most common methods of transportation. Between 5:00AM and 6:00AM is the least common hour to be picked up by a taxi in Manhattan. During this hour, the average distance travelled per taxi ride is the highest, and the average tip per mile is lower than daytime tips but not significantly higher than the average tip amount during night hours. (Daytime hours are between 6:00AM and 8:00PM)

Data Analysis

Taxis run throughout the day and night, but they are not hailed with the same frequency during all hours of the day. The bar graph below represents the average number of taxis taken per each hour of the day.

The number of taxis fluctuates throughout the day, but there is clearly a consistent decrease afer 9:00PM until the number picks back up again at 6:00AM. An average of 4,271 taxis are taken between the times of 5:00AM and 6:00AM (highlighted in purple) compared to 28,916 taxis at 7:00PM.

Aside from being the hour with the lowest number of taxis taken, 5:00AM is also the hour with the highest average trip distance. The bar graph below represents this, illustrating the average trip distance in miles per hour of the day.

The average trip distance at 5AM is 4.872990 miles while most trips stay within the 2 to 3 mile range throughout the day.

## 
## Call:
## lm_basic(formula = trip_distance_mean ~ 1 + five_flag, data = taxi_hour)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.4795 -0.3745 -0.1561  0.1835  1.9489 
## 
## Coefficients:
##               Estimate  2.5 % 97.5 %
## (Intercept)     2.7711 2.5311  3.011
## five_flagTRUE   2.1019 0.9259  3.278
## 
## Residual standard error: 0.5551 on 22 degrees of freedom
## Multiple R-squared:  0.3844, Adjusted R-squared:  0.3565 
## F-statistic: 13.74 on 1 and 22 DF,  p-value: 0.00123

I also created a model for the inference of the mean to determine whether the increase in trip distance at 5AM is statistically significant. The regression table provides a strictly positive confidence interval and an estimated average increase of 2.1019 miles at 5AM versus all other hours of the day.

I calculated the average tip per mile, so tips would not be inflated for long trips. As seen in the bar graph below, tips to taxi drivers, overall, are lower at night.

The hour between 5:00AM and 6:00AM is once again highlighted in purple. It seems as if there is a slight increase in tip amount during this hour, and the value of $1.61 is the second highest for the nighttime hours.

I created two different models to look at significance of this differences. The first model shows how tips are affected by whether it is nighttime or daytime.

## 
## Call:
## lm_basic(formula = tip_per_mile_mean ~ 1 + night_flag, data = taxi_hour)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32130 -0.01660  0.03171  0.05426  0.18171 
## 
## Coefficients:
##                Estimate   2.5 % 97.5 %
## (Intercept)      1.4305  1.3655  1.496
## night_flagTRUE  -0.2628 -0.3635 -0.162
## 
## Residual standard error: 0.1173 on 22 degrees of freedom
## Multiple R-squared:  0.5712, Adjusted R-squared:  0.5517 
## F-statistic: 29.31 on 1 and 22 DF,  p-value: 1.947e-05

There is a significant decrease in tips for taxis taken between the hours of 8:00PM and 6:00AM with an average decrease of $0.26 per mile.

The second model looks specifically at the 5AM hour but compared only to the other nighttime hours.

## 
## Call:
## lm_basic(formula = tip_per_mile_mean ~ 1 + five_flag, data = taxi_night)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.16825 -0.04410  0.01300  0.04809  0.11792 
## 
## Coefficients:
##               Estimate    2.5 % 97.5 %
## (Intercept)    1.16119  1.08675  1.236
## five_flagTRUE  0.06517 -0.17021  0.301
## 
## Residual standard error: 0.09684 on 8 degrees of freedom
## Multiple R-squared:  0.04849,    Adjusted R-squared:  -0.07045 
## F-statistic: 0.4077 on 1 and 8 DF,  p-value: 0.541

The regression table provides a confidence interval that crosses through 0, so no statistically significant conclusions can be drawn about the tipping rate at 5AM.

Conclusion

In addition to being the hour in which the fewest taxis are taken, 5AM also has the highest average trip distance with an estimated average increase of 2.1019 miles. The spike in average trip distance for taxis taken between 5:00AM and 6:00AM may be the result of a higher proportion of trips taken to places such as the airport. However, while tips are generally lower during the nighttime hours, taxi rides at 5AM do not have a statistically significant increase in tip amount.