Shanghai Datathon with the Centre for Doctoral Training

April 2018 · 3 minute read

Recently Jon, Sarah and I were lucky enough to be invited to participate in a Datathon (think Hackathon centered around extrapolating interesting information from a certain dataset) hosted by the Shanghai University. This was a four day event in which we were divided into three groups, and given complete freedom over what we decided to do with the data!

Working alongside students from the Shanghai University was such a positive experience. The students showed us around the University, took us to their lunch hall, and even bought us pizza! Every member of our teams contributed equally and put in 100%, allow us to produce some impressive visualisations. Working around language barriers only added to the fun we had.

The datathon was focused around the NYC Taxi & Limousine Commission’s Trip Record Data, which contains information on over 1.1 billion New York city taxi journeys over an eight year period (accessible here). Somewhat ironically, in my spare time I develop a local taxi companies mobile applications and backend systems, and hence I had a few ideas of the interesting features we could attempt to extract from the dataset.

My team decided to use the Anaconda suite as well as a few web technologies to produce an interaction visualisation of shareable rides within Manhattan. These shareable rides were defined as rides on which a driver could pick an additional passenger up along his route and drop off both passengers without drastically affecting the drop-off time of either passenger.

The sharing of rides is a pre existing concept, though is often only adopted by ride sharing platforms such as Uber and Lyft. Sharing yellow NYC taxis could lead to reduced congestion, improved driver earnings and perhaps even a friendlier city.

The above image shows the interactive webpage we created to visualise shareable rides. The orange route shows the driver’s optimal path the driver could take in order to minimise delay. By having the parameters of our different algorithms changeable through the web interface we could quickly see demonstrate their effectiveness. We found that around 75% of trips could be shared if people would accept a 5 minute delay in their arrival time.

At the end of the Datathon our projects were presented to Professor Dame Wendy Hall, Professor Yi-Ke Guo, and Dr. Tuo Leng, and we answered higher level questions on our chosen projects. For me this was certainly an interesting experience as my team and I spent much of our time during my datathon dwelling on low level technical details, however it is often human factors that dominate the feasibility of an idea such as ride sharing.

The source code for this project is available here. It was the first time anyone in the team had used Python, and hence was a wonderful opportunity to try to pick up some skills ‘on the job’.

There can be no work without play :)

Though we worked very hard on the trip, we were lucky enough to also get some down time in the evenings and on Sunday the 25th we took a foray into Shanghai city centre, where we were able to see the Bund and visit the Oriental Pearl Tower (the building composed of large spheres pictured below).