Dec 192011
 

Now that we have a very simple function to download historical stock data from Yahoo, let’s see if we can use the same technique to download data from Google Finance.

If we head over to the main page for Google finance and pull up a quote for AAPL, we will see a link for historical prices on the upper left corner. Clicking on that link will bring up the historical prices page. By default, the page shows a nicely formatted table of prices with the latest price at the top of the page. Notice that there is an option to pick any two date ranges for the prices, and, just like Yahoo, there is a link to download the data to a spreadsheet. Now we pick a date range (in this case, Nov 1st – Nov 30th) and click the update button. Then we can download the data using the link on the right side of the page. By default it downloads as “data.csv”. We end up with the following data:

Date,Open,High,Low,Close,Volume
30-Nov-11,381.29,382.28,378.30,382.20,14502386
29-Nov-11,375.84,378.83,370.20,373.20,13429694
28-Nov-11,372.35,376.72,370.33,376.12,12380122
.
.
.
3-Nov-11,399.07,403.40,395.36,403.07,15768725
2-Nov-11,400.09,400.44,395.11,397.41,11723381
1-Nov-11,397.41,399.50,393.22,396.51,18992411

This data looks very similar to the Yahoo data, but without the adjusted close. It appears that Google adjusts the prices for us, so we will not have to. Also, notice the dates are formatted differently. It should be very easy to convert to our standard format that we defined in our Quote class in the previous post.

Now we need to decipher the link used to get the data. We copy the link by right clicking it and selecting “Copy link address” and end up with the following:

http://www.google.com/finance/historical?cid=22144&startdate=Nov+1%2C+2011&enddate=Nov+30%2C+2011&num=30&output=csv

We have the query options: cid, startdate, enddate, num, and output. Take a look at the date values. What’s up with the percent signs? Anytime you see “%” signs in a url, it means that it has been url encoded. By searching google, we can easily find many sites that will encode and decode url strings. By decoding the raw url string, we end up with:

http://www.google.com/finance/historical?cid=22144&startdate=Nov 1, 2010&enddate=Nov 30, 2011&num=30&output=csv

That looks much easier to figure out.

We now have:

  • cid = 22144
  • startdate = Nov 1, 2011
  • enddate = Nov 30, 2011
  • num = 30
  • output = csv

There is only one problem. What is the cid? Probably some sort of database id. But by observing the query string for the main page, we can see that it uses a “q” option that takes the exchange and symbol name. By playing around with different combinations of these parameters, I found that we can use the q option without the exchange, and the num option does not seem to be necessary. We can simplify the url string to:

http://www.google.com/finance/historical?q=AAPL&startdate=Nov 1, 2011&enddate=Nov 30, 2011&output=csv

The only thing left to do now is to subclass our original Quote class and customize it to build our url string and convert the dates to our common format. Here is what I ended up with:

class GoogleQuote(Quote):
  ''' Daily quotes from Google. Date format='yyyy-mm-dd' '''
  def __init__(self,symbol,start_date,end_date=datetime.date.today().isoformat()):
    super(GoogleQuote,self).__init__()
    self.symbol = symbol.upper()
    start = datetime.date(int(start_date[0:4]),int(start_date[5:7]),int(start_date[8:10]))
    end = datetime.date(int(end_date[0:4]),int(end_date[5:7]),int(end_date[8:10]))
    url_string = "http://www.google.com/finance/historical?q={0}".format(self.symbol)
    url_string += "&startdate={0}&enddate={1}&output=csv".format(
                      start.strftime('%b %d, %Y'),end.strftime('%b %d, %Y'))
    csv = urllib.urlopen(url_string).readlines()
    csv.reverse()
    for bar in xrange(0,len(csv)-1):
      ds,open_,high,low,close,volume = csv[bar].rstrip().split(',')
      open_,high,low,close = [float(x) for x in [open_,high,low,close]]
      dt = datetime.datetime.strptime(ds,'%d-%b-%y')
      self.append(dt,open_,high,low,close,volume)

This one is even simpler than the Yahoo version. There is no need to calculate adjusted prices. The main difference is the different url format and time formatting.

You can download a file with the Quote class and a few command prompt tests here: google.py

In a future post, we will consolidate all of these classes and create a Python module/command line utility to download historical data from these various sources save them to disk in a consistent format. Stay tuned!

  3 Responses to “Downloading Google Finance historical data with Python”

  1. […] Downloading Google Finance historical data with Python Share this:FacebookTwitterGoogle +1Like this:Like Loading… This entry was posted in 程式設計 and tagged Python, 程式設計. Bookmark the permalink. ← [轉貼] Skanect 1.0 Tutorial – Quick Self-Scan […]

 Leave a Reply

(required)

(required)