Dec 192011
 

So far, in my previous posts, we have been downloading DAILY stock prices from the well known sites Yahoo and Google. Both sites offer very easy access to free daily historical data. Both sites also offer nice charts of the stock prices. But some of you may have noticed that Google charts not only daily prices, but they also chart INTRADAY prices down to two minute resolution. However, there are no published links to get the raw data used to build the charts. Bummer. Let’s do a little investigating and see what we can figure out.

For those of you using the excellent Chrome browser, it has a built in utility to examine virtually all aspects of a web page (or site). It is called Developer Tools. It can be started clicking on the wrench icon in the upper right corner, selecting Tools, then Developer Tools (or simply Ctrl+Shift+I). This utility has an enormous amount of functionality, but we are mainly interested in any network traffic a page generates. So lets start by pulling up the Google Finance summary page for AAPL. Then start the developer tools and click on the Network tab. Finally, on the chart for AAPL, click on the 2min interval link. At the bottom of the page will be a list of all the network requests that were generated by clicking on the 2min link. Looking at them, there is one called getprices. That looks promising! Now click on that link and then the “response” tab along the top to see what the server response was to the request. Here is what my browser looked like:

The data we are interested in looks like this:

EXCHANGE%3DNASDAQ
MARKET_OPEN_MINUTE=570
MARKET_CLOSE_MINUTE=960
INTERVAL=120
COLUMNS=DATE,CLOSE,HIGH,LOW,OPEN,VOLUME
DATA=
TIMEZONE_OFFSET=-300
a1324322040,383.71,383.82,383.56,383.655,8114
1,383.89,383.9,383.61,383.7,15512
2,383.844,384.03,383.81,383.83,15864
3,383.51,383.91,383.41,383.84,25327
4,383.58,383.7,383.44,383.44,16367
5,383.16,383.53,383,383.51,37838
6,383,383.19,382.95,383.18,16285
7,383.15,383.21,383,383.13,0
8,382.99,383.2,382.95,383.1998,0
9,382.888,383.018,382.77,383.018,0
10,382.96,383.02,382.75,382.9,0

Now we are getting somewhere! That sure looks like price and volume data, along with some sort of time stamp and a list of offsets. Here is what the link looked like:

http://www.google.com/finance/getprices?q=AAPL&x=NASD&i=120&sessions=ext_hours&p=5d&f=d,c,v,o,h,l&df=cpct&auto=1&ts=1324323553905

If we copy that link and put it directly in the url bar of our browser, we can get the full data set. It appears to have some header data followed by a list of prices. The dates appear to consist of a periodic time value followed by a series of offsets.

So there are two things to figure out. First we need to decipher the url string just like we did for the daily data, and then we need to decode the time stamp data in the returned results.

Let’s tackle the url string first. We will do it just like the other two previous attempts. We end up with:

  • The base url is http://www.google.com/finance/getprices
  • q is the symbol (AAPL)
  • x is the exchange (NASD)
  • i is the interval in seconds (120 = seconds = 2 minutes)
  • sessions is the session requested (ext_hours)
  • p is the time period (5d = 5 days)
  • f is the requested fields (d,c,v,o,h,l)
  • df ?? (cpct)
  • auto ?? (1)
  • ts is potentially a time stamp (1324323553 905)

Again, I spent some time playing around with the query parameters and found that you can get away with only q, i, p, and f. A simplified url (to get 10 days of 5 minute AAPL data) looks like this:

http://www.google.com/finance/getprices?q=AAPL&i=300&p=10d&f=d,o,h,l,c,v

Go ahead and click on the link to try it. Now we can download intraday data for a specific period back in time. There may be a way to get it between any two dates, but I haven’t figured that out yet. If anyone knows, I sure would be interested in finding out how.

The next problem is to figure out how to deal with the weird date formats. In data returned from the example above, I got “a1323181800” in the first date field and moving down, the last “a” prefixed number displayed was “a1324045800”. Those look suspiciously like Unix time stamps to me. All major programming languages support them and they are simply the number of seconds since 1/1/1970. Again, using my friend Google, I found an online Unix time stamp converter and plugged in the above numbers (without the “a”). I got: “Tue, 06 Dec 2011 14:30:00 GMT” and “Fri, 16 Dec 2011 14:30:00 GMT” for the second. Bingo! They are Unix time stamps. And the data appears to be listed with the oldest first. So, to get the date for each time stamp, we just need to convert any “a” prefixed times to the Unix time, then for each succeeding time, we add the offset multiplied by the number of interval seconds to the last recorded Unix time until we get an new “a” prefixed time. It sounds a lot more complicated than it really is.

Since I am only interested in the DOHLC values, I am going to ignore the header fields in our dataset. And since this post is getting pretty long, I will present the Python class I wrote to parse the data:

class GoogleIntradayQuote(Quote):
  ''' Intraday quotes from Google. Specify interval seconds and number of days '''
  def __init__(self,symbol,interval_seconds=300,num_days=5):
    super(GoogleIntradayQuote,self).__init__()
    self.symbol = symbol.upper()
    url_string = "http://www.google.com/finance/getprices?q={0}".format(self.symbol)
    url_string += "&i={0}&p={1}d&f=d,o,h,l,c,v".format(interval_seconds,num_days)
    csv = urllib.urlopen(url_string).readlines()
    for bar in xrange(7,len(csv)):
      if csv[bar].count(',')!=5: continue
      offset,close,high,low,open_,volume = csv[bar].split(',')
      if offset[0]=='a':
        day = float(offset[1:])
        offset = 0
      else:
        offset = float(offset)
      open_,high,low,close = [float(x) for x in [open_,high,low,close]]
      dt = datetime.datetime.fromtimestamp(day+(interval_seconds*offset))
      self.append(dt,open_,high,low,close,volume)

It should be pretty self explanatory. The main difference between this class and the others is that is takes an interval (in seconds) and a lookback (in days) instead of a start/end time. I ignore the first five header fields, then check for date fields prefixed with an “a”. I keep track of the offset and calculate the correct date with the line:

dt = datetime.datetime.fromtimestamp(day+(interval_seconds*offset))

There is one last tricky part: Notice that the OHLC data is listed in a non-standard format. It actually is returned as CHLO, no matter what order the f fields are listed.

It should be noted that it appears that there is only the most recent 10-20 days of intraday data available at any given time. Buy hey, what do you want for nothing! Later on, we will try to find some interesting things to do with this data and these limitations.

And finally, a complete example: google_intraday.py

  17 Responses to “Downloading Google Intraday historical data with Python”

  1. Nicely done. The good thing with this is that you can access historic currency data, which is not otherwise available through either yahoo or google.

    This would’ve simplified the csv parsing/writing for you:
    http://docs.python.org/library/csv.html#examples

  2. Very elegant code. Concise, yet readable. Tanks for this. I hope you find the time to post again.

  3. i have to admit this alluded me, being able to download this was such a help thanks, i would have never got the dates working myself. I use the csv lib to crunch the text files i collect after for charting

  4. […] of all, I would like to thank Mr. Mark Chenoweth, who originally wrote the entire script in Python 2.7, without his encouragement I would not have […]

  5. This page has an Excel spreadsheet with VBA that downloads intraday backfill from Google.

    http://investexcel.net/free-intraday-stock-data-excel/

  6. Man, you are the best.

    I’ve searched all the web, and your method is the most reliable one. Nearly 3 years passed and it still works beautifully.

    Why no updates in your blog? I hope you can continue to share more great things.

  7. elegant, thank u.

  8. You should make this a package in python, get some momentum going here.

    I wonder how this compares to ultrafinance package.
    But this code still is the best.

  9. google now seems to limit to 4000 lines, is there a way around this?

  10. But, this won’t give more than 50 day data…

  11. […] FURTHER READING     Chenoweth, M., 2011, Downloading Google Intraday historical data with Python […]

  12. “ts” is supposedly the tag for (Unix) timestamp; if omitted it defaults to today – but in practice I can’t seem to get it to return anything (even EOD data) from more than 20 days ago.

  13. How to get intraday data between two time stamps. For ex, I want 1 minute resolution data between 10:00 and 10:30 hrs only, or data from 11:00 hrs till present?

  14. Hi, buddy, your code is really helpful. But I have two question. First, there are some data were missing during a day. Such as some row of several separate minutes were randomly missing. Do you find this kind of issue? Second, the open price of time t is not equal to close price of time t-1. Does this make sense? I think it doesn’t make sense if each time interval is continues.

    Look forward to hearing from you.

  15. Hello

    Really nice code. I was trying to get One minute data by the same code. Is it possible?

 Leave a Reply

(required)

(required)