Mapping Manhattan Surge Pricing


Efficiently getting around Manhattan takes a certain combination of gumption, agility, and a keen sense of both where you and, as I so kindly refer to them, all of those people who don’t know how to walk are trying to go. You don’t try to hawk a cab on 49th and Broadway after the Eugene O’Neill theater empties out. Each new person looking for a cab moves 20 ft further up Broadway, undercutting those hopelessly waiting on the corner before a cab can reach them. The seasoned veteran knows to avoid jostling with tourists and the Times Square mafia by walking over to 8th or 9th ave to find an open cab.

The key is knowledge. Whether you’re in town for the weekend or a resident who’s fortunate enough to experience NYC price inflation each day of the year, knowing the little details like the direction of avenues and when to not take the FDR are all part of adroitly navigating NYC. Enter Uber. Uber’s service and app are both designed to take thinking out of traveling. The app takes care of payments, suggests easy pick-up locations nearby, and even optimizes the route by deferring to Waze. However, the cost Uber imposes for softening many of the pain-points of traveling comes in the form of surge pricing. I don’t believe the end-of-days narrative that taxi monopolies across the country are trying to sell when it comes to Uber’s business model. Those riders who are left rankled by surge pricing in their area need to take a page from my anecdote above and simply employ some Uber knowledge to get from point A to point B. Both Uber and Lyft have rolled out surge pricing maps to help riders understand the demand in their area at any given time and to help drivers meet demand. In certain scenarios, you might not be able to escape surge pricing and should look to other ride sharing methods, public transportation, or even a taxi. It’s all about having as much information as possible to make a decision that you’re most comfortable with.

The goal of this exercise is both selfish and altruistic.

The selfish:
– Play with the Uber API
– Learn some nifty Python mapping packages
– Ramble in blog form

The altruistic:
– Observe how surge pricing moves throughout evening rush hour
– Find areas of reduced surge pricing near areas that often have heightened demand

The aforementioned surge pricing apps are definitely more polished than my plots and they operate in real-time, but I hope that my work can tell a short story that ultimately leaves you with better information to get around the city. Check out my GitHub for the source code.

The Eye Candy

Before I show some pretty plots, I just wanted to take a moment to give credit where credit is due. I made my plots while heavily referring to the sample plots of London I found here. Also, I used the database of Manhattan restaurants at NYC Open Data to generate my rough grid of points. The points are roughly 2 city avenues apart (about a 5 min walk). You’ll notice empty patches over Central Park, Hudson Yards, and other areas devoid of restaurants. I essentially used restaurants as a proxy for foot traffic areas in Manhattan. I thought this was a valid compromise between the coverage issues of using subway stops and an exhaustive grid of the island.

4:30 pm-7:30 pm Weekday Rush Hour

The first period I looked at was evening rush hour. You can see that getting an Uber in midtown between 4:30 pm and 6:30 pm on a weekday will almost always come with 1.5x surge pricing. For reference, I’ve annotated a few Manhattan points of interest, some of which are in high-business areas. The earliest surge pricing occurs is downtown near the Stock Exchange at 4 pm when markets close. This demand seems to be transient as the rates are back to normal fifteen minutes later.

From 4:30 on, however, you can see varying surge prices in midtown from 20th st up to Central Park. Some intervals, like 4:45 pm and 5:45 pm exhibit very localized demand near Penn Station. Between intervals, it’s pretty common for surge to bounce between 2-2.9x. If you’re looking to wait our the surge, you may be waiting until 7 pm when driver supply seems to meet demand. Please see the appendix for links to the individual images making up the gif.

Time-lapse of Manhattan Surge Pricing During the Evening Rush Hour on Tuesday May 24, 2016.

In general, it seems that any Wall Street-based surge pricing remains very localized to the area, not extending too far uptown. If people working on the street don’t mind the walk, I’d highly recommend walking a few blocks north to see largely reduced prices. Midtown offers little respite from demand-based pricing during the evening rush hour. The time-lapse above shows that you can sometimes find pockets of low-surge but if you find yourself at Grand Central you may have to walk 20 blocks in either direction to see a change. At that point, just hail a cab or sweat it out in the subway.

Average Manhattan Surge Pricing During Evening Rush Hour on May 24, 2016











Here’s the same time-lapse for the following day’s evening rush hour (it was a Wednesday). I can’t explain why there was such high demand in midtown as early as 4 pm with surge pricing extending even into the Upper East/West Sides. One area for future research would be to plot both the number of Uber drivers and Uber users at any given time to see whether surge pricing is the result of reduced supply or increased demand for a given time period. If I had to guess, I would say that there were fewer available drivers on this day as there’s increased surge even in SoHo and the East Village. I would expect rider demand during the evening rush hour to be pretty stable in those areas from day to day as they are generally less corporate than midtown.

Time-lapse of Manhattan Surge Pricing During the Evening Rush Hour on Wednesday May 25, 2016

Like above, here’s the average surge over the Wednesday evening rush hour. This further illustrates that this was a generally poor time to ride Uber if you’re pinching pennies.

Average Manhattan Surge Pricing During Evening Rush Hour on May 25

One note I’d like to make is that the gifs above are still only two samples from two evenings in Manhattan. My observations and the reasoning I attach to them are mostly conjecture. I hope this exploration will at least get you to move the pin over a few blocks next time surge pricing makes you second guess calling an Uber. The change in price may be nominal or you may end up finding a cab in the time it takes you to walk over to your new pick up location. Regardless, I’m generally happier with my decision when I have a fuller understanding of my options.

7:30 pm – 10:30 pm Rush Hour

After a lull from 7:30 – 8:45 pm, surge pricing rose again mostly around Times Square and extended throughout the west side from Tribeca up to Columbus Circle. Broadway shows typically let out around 9 pm which may explain the activity between Penn and Columbus Circle, but the demand seems much more widespread than Broadway alone could explain.

Time-lapse of Manhattan Surge Pricing 7:30-10:30 pm, Tuesday May 24, 2016

I think the high demand from 9-10 pm may be a perfect storm of post-work dinner finishing up while other Manhattanites are just starting their nights in Chelsea, Tribeca, and the Meatpacking District. The time-lapse below exhibits the same phenomenon along the west side, this time for Wednesday May 25. The lesson to be learned here is if you’re on the west side during a weeknight, either finishing up or just starting your night, you may have a harder time escaping surge pricing from 9-10 pm. You’ll have a much easier time avoiding surge pricing if you’re spending your weeknights on the east part of Manhattan.

Time-lapse of Manhattan Surge Pricing 7:30 – 10:30 pm, Wednesday May 25,2016

For plots of the average surge prices over the two days, see the appendix.

Final Remarks and Further Work

Remember, the following points are remarks are from two days of monitoring Uber. There’s always the risk of being mislead by small sample bias, but I had to start somewhere. With that said…

  • Surge pricing can vary every fifteen minutes, but there are some areas of Manhattan where demand is off the charts for sustained periods of time.
  • If you’re near Grand Central Station, try walking up to 50th st. for a chance at lower surge pricing.
  • Wall Street surge pricing seems relatively isolated. Walking a few blocks north may lead to lower fares.
  • Surge pricing is pretty low during the evening rush hour between Wall Street and Union Square.
  • After 9 pm on weekdays, expect surge pricing along the west side from Columbus Circle down to Chelsea.

Now that I have a reasonable framework for collecting and processing Uber API requests, I could do something similar with Yankee games in the Bronx, late night downtown, or weekend brunch. While I’m generally happy with the plots I’ve made, an interactive application built with  D3.js could really shine.


Single Interval Images from May 24, 2016 can be found here

Single Interval Images from May 25, 2016 can be found here

Average Surge Price, 7:30-10 pm, May 24, 2016
Average Surge PRice, 7:30-10 pm, May 25,2016

Quick Take: Data Exploration Meets Fine Art

For anyone that’s made it this far, I’m going to switch gears a little bit to the less technical topic of data exploration. The format will also be a little different as the post will be a commentary on the FiveThirtyEight article exploring New York’s MoMA through the lens of an inquisitive art-lover. For those who are unfamiliar with FiveThirtyEight it’s a blog started by economic consultant turned baseball statistician turned political prognosticator turned modern data superstar, Nate Silver. Him and his team of analysts cover a wide breadth of topics including sports, socioeconomic standards, and the political viability of Trump 2016. Even though the contributors have plenty of formal statistical training, the site best illustrates the importance of asking piquing questions and the value of a well-place scatter plot or heat map.

You don’t need a fancy degree or knowledge of various theorems to appreciate the value of asking the right questions, making some simple plots, and just watching the insight fall out. Often, these insights are not an end in themselves, but their presence often begs new questions to be asked where more refined analytical techniques can do their due diligence. MoMA github is a wonderful source for data that has been collected without an obvious motivation for subsequent analysis. While it’s useful for a museum to have a log of currently-held works of art, it lacks the obvious analytical value that a database of baseball statistics may have, for example. Data of this type is often the one of the most interesting to explore. Another example would be databases that contain thousands of books or articles through which anyone can explore the syntax and symantics of the English language or build an algorithm that identifies books on similar topics. Natural Language Processing deserves much more attention than I can give it here in a few sentences but I highly recommend checking Stanford’s coursera course if you’re more interested.

Alright, back to the article. How weird is it to refer to Van Gogh’s The Starry Night, one of the most recognizable pieces of art, as Object 79802? But in the age of Big Data, all of its singular moma_year_plotbeauty and detail give way to bigger questions asked of the collection to which it belongs. ,As the plot to the right suggests, MoMA lives up to its goal of displaying the art of our time. Note that most of the points, representing different pieces of art, were acquired in the year that they were painted (ie they lie on the line y=x. Gotta love algebra). Many of the pieces though were acquired anywhere from 2 to 125 years after the oil was dry. The red line of regression, also called the line of best fit, shows that the museum billed as a home for modern art tends to stick true to its name while also housing some older works. There’s a  distinguishable horizontal line near y=1985, denoting that the museum acquired many works that were painted across a wide range of years in a very short time span. Another more hazy observation is the vertical column of points spanning about 1905-1920 on the x-axis from 1930-2000 on the y-axis. I don’t know much about modern art but I do know that many iconic artists lived through the early twentieth century including iconic figures like Pablo Picasso and Norman Rockwell, both represented in MoMA.

Another interesting plot later in the article illustrates the different aspect ratios across the various works in MoMA. While it’s clear to see that artists prefer working with rectangular ratios rather than squares, it’s hard to gleam any more insight from the cluttered area plot below. Nonetheless, the appeal of this type of plot is how easy it is to quickly glance and realize that the size of the bars directly translates to the shape of the painting as you would see it in the museum. While not as straightforward, I think the other plot displaying aspect ratios using a scatter plot is more insightful. Nonetheless, the appeal of this type of plot is how easy it is to quickly glance and realize that the size of the bars directly translates to the shape of the painting as you would see it in the museum.

(1) Aspect Ratios
(1) Aspect Ratio
(2) Aspect Ratio
(2) Aspect Ratio

While not as straightforward, I think the other plot displaying aspect ratios using a scatter plot is more insightful. Take note how the purple line (another line of regression) runs straight through (x,y)=16,9. If you’re familiar with photography or watch as much HDTV as I do, you’d recognize this particular number pairing as one of the most common aspect ratios across all media. When exploring data, it’s very common to look for things in the data that you’re familiar with and would expect to be represented. Especially early on, this can be reassuring, but it’s important to approach data sets with minimal bias. Trying to manipulate the output to support some prior expectations, as opposed to letting the data guide your inference, is a dubious practice whose effects are felt throughout many scientific domains (see Bad Incentives Are Blocking Better Science).

Now for a little teaser of things to come. In the coming weeks I’ll be working on an independent project using the Consumer Financial Protection Bureau’s database of complaints. Started by a committee led by the pugnacious Elizabeth Warren, this database contains complaints consumers had regarding exploitative credit card agreements they were being held to, questionable terms of car loans buried under pages of unnecessary jargon, and other practices that big businesses use to poach consumers who come under hard times. Like the MoMA database, these records weren’t necessarily collected so that some stats nerds could crunch a solution to help the little guy being preyed upon by business bullies. I really don’t know what to expect by getting into this data, but I want to use this as an opportunity to try my hand at some natural language processing as I’ve never really had to use it in any of my courses. Let’s do this!

The Mersenne Twister

When you type 2+2 into a calculator, it’s always going to spit out the number 4. The same is generally true for computer programs whether it’s in Microsoft Excel or the calculator on your iPhone. This is because computers are deterministic machines who are notoriously good at following explicit directions and are capable of performing thousands of operations each second. In short, you expect the same computer input to always give you the same output.

What happens when you want a computer to do something random like virtually roll a die 5 times? It’s easy enough to do this in real life, but how do you get truly random results from machine designed to not to leave output to chance?
Continue reading “The Mersenne Twister”

Hello world!

Well, here it is. For all of you unfortunate people who have had to sit through one of my unavoidable math-fueled tangents, I’m taking my thoughts to this blog from now on. I can’t guarantee that you won’t hear me ramble about how a computer generates pseudo-random numbers from time to time, but hopefully the inception of this blog will lead to a few less glazed-over stares and less friends lost to my weird obsession with Mathematics and its ilk.

What can you expect by checking this blog out from time to time? I’m glad you asked. In general, I’ll cover everything from Data Science (start off with the sexy topic that draws the oohs and ahs) to the latest advances in computer hardware and architecture (please don’t leave! If you stay I’ll keep posting videos of baby pigs). Statistics during this upcoming election season, new open-source machine learning tools, even some campfire tales about math superstars like Carl Friedrich Gauss are fair game. I definitely don’t claim to be an expert on anything I post about, and I hope this blog will force me to go past cursory knowledge of a topic to a point where I can understand these things comfortably enough to write a coherent blog.

If you’ve made it this far down my first post, CONGRATULATIONS. I’ll leave you with a little taste of what’s to come. So I named the blog ‘The Probability of Success’. Why? First off, because awesome names like ‘Write That Down Bro‘ were already taken. Secondly, I was searching for a math term that was less cliche than the limit does not exist and more nuanced than the square root of 69 is 8-something, right? Lastly, probability and statistics are probably the most useful parts of math that the everyday person could learn in reasonable time. Yes, I know that Trump is polling at 25% and Ted Cruz is polling at 10%, and I know that the former is larger than the latter, but I don’t think there’s any reason to give too much credence to any projections at this point. A little familiarity with sample sizes and the importance of having your sample pool be representative of the larger population would surely make you think twice before buying that Trump 2016 shirt.

That’s it for tonight. Thanks for checking out the blog. Please come back eventually.