For this Valentine’s Day themed data doodle, let’s use
BigQuery and the New York
City public dataset on Citibike trips and to find a romantic spot to bike
to with a significant other.
The number 1 spot for groups of people to bike to together is the
5 Ave & E 88 St Citibike station,
right by Central Park, with 30% of trips completed by groups. Also in top
spots were the other stations around Central Park:
5 Ave & E 78 St
and
5 Ave & E 93 St.
The runner-up is
Cadman Plaza E & Tillary St
in Brooklyn, with 26% of trips completed by groups. Third place goes to
Brooklyn Bridge Park — Pier 2, with 22% of trips completed by groups.
Where does the data say you shouldn’t you take your significant other?
Only 1% of trips completed at the Penn Station Valet station were
by groups of cyclists.
Code for the
query to find where couples are going on the NYC Citibikes:
#standardSql
SELECT
group_trips / (single_trips + group_trips) AS percent_groups,
single_trips + group_trips AS total_trips,
q.end_station_id AS end_station_id,
stations.name AS name,
stations.latitude AS latitude,
stations.longitude AS longitude
FROM (
SELECT
COUNTIF(group_size = 1) AS single_trips,
COUNTIF(group_size != 1) AS group_trips,
end_station_id
FROM (
SELECT
ROUND(UNIX_SECONDS(starttime) / 120) AS start,
-- round to nearest 2 minutes
ROUND(UNIX_SECONDS(stoptime) / 120) AS stop,
-- round to nearest 2 minutes
start_station_id,
end_station_id,
COUNT(*) AS group_size
FROM
`bigquery-public-data.new_york.citibike_trips`
GROUP BY
start,
stop,
start_station_id,
end_station_id )
GROUP BY
end_station_id ) q
LEFT JOIN
`bigquery-public-data.new_york.citibike_stations` AS stations
ON
q.end_station_id = stations.station_id
ORDER BY
percent_groups DESC
Breaking this query down, I group Citibike trips together into ones that start
and end at about the same time. If the grouped-together trip had more than one
person in it, then it’s counted as a group trip. This could be a couple or a
group of friends riding together. Possibly it could be people that happened
to check out a bike at the same time and go to the same place at the same
rate, but I’m considering that a rare occurrence. Then, I find which stations
have the highest percentage of trips from groups versus single-rider
trips.
If you want to run the analysis yourself, I’ve shared the query
here. You can run the BigQuery queries in this data doodle for
free, no credit card required, with 1TB of free queries per month.