Dec 032011
 

In the previous post, we learned to manually build a url to download historical data for any stock symbol between any two dates. Now we want to build a set of simple Python classes to retrieve the data and store it in a consistent manner.

However, there is one more thing to figure out. Some of you may have noticed that there is an extra field (Adj Close) in addition to the normal Open/High/Low/Close/Volume data associated with a daily stock price. It turns out that stock prices can be adjusted for either dividends paid out or stock splits. For example, if a stock is trading at $500, a company board may decide to create 5 shares of $100 stock for each original $500 share. That way it will be easier for some investors to buy the stock. The day after this happens, the quoted price will drop from $500 to $100, but the company will still be worth the same (with five times as many shares). Yahoo data includes the actual OHLC prices on the date they were published, as well as an Adjusted Close price. Normally for the most recent prices, the close price is the same as the adjusted close price and nothing needs to be done. But for data that was published before the most recent split or dividend, the two prices will be different. In order to get a consistent representation of the price for those days, the OHLC prices must be multiplied by the adjusted close divided by the close for that day. You will end up with a new set of prices for that day that were different than what was published on that day, but be proportionally accurate relative to today’s prices. This is essential to back test trading systems accurately.

Now we can finally start writing some code! Let’s think about what we want to accomplish.

  • We want to save the symbol name
  • We want to save the date and time. (Time is not required for daily quotes, but we will be adding an intraday source later.)
  • We want to save the Open, High, Low, Close, and Volume (OHLCV) values for each day.
  • We want to save the OHLCV as Lists (arrays) in memory that are easy to manipulate.
  • We want the ability to save the data locally in comma separated files.
  • We want to be able to read our local files back into memory.
  • We want consistent date/time formatting.
  • We want the prices ordered from oldest to newest to make it easy on our later calculations.

First we will create a base class that contains all of the common functions we require.

class Quote(object):
  
  DATE_FMT = '%Y-%m-%d'
  TIME_FMT = '%H:%M:%S'
  
  def __init__(self):
    self.symbol = ''
    self.date,self.time,self.open_,self.high,self.low,self.close,self.volume = ([] for _ in range(7))

  def append(self,dt,open_,high,low,close,volume):
    self.date.append(dt.date())
    self.time.append(dt.time())
    self.open_.append(float(open_))
    self.high.append(float(high))
    self.low.append(float(low))
    self.close.append(float(close))
    self.volume.append(int(volume))
      
  def to_csv(self):
    return ''.join(["{0},{1},{2},{3:.2f},{4:.2f},{5:.2f},{6:.2f},{7}\n".format(self.symbol,
              self.date[bar].strftime('%Y-%m-%d'),self.time[bar].strftime('%H:%M:%S'),
              self.open_[bar],self.high[bar],self.low[bar],self.close[bar],self.volume[bar]) 
              for bar in xrange(len(self.close))])
    
  def write_csv(self,filename):
    with open(filename,'w') as f:
      f.write(self.to_csv())
        
  def read_csv(self,filename):
    self.symbol = ''
    self.date,self.time,self.open_,self.high,self.low,self.close,self.volume = ([] for _ in range(7))
    for line in open(filename,'r'):
      symbol,ds,ts,open_,high,low,close,volume = line.rstrip().split(',')
      self.symbol = symbol
      dt = datetime.datetime.strptime(ds+' '+ts,self.DATE_FMT+' '+self.TIME_FMT)
      self.append(dt,open_,high,low,close,volume)
    return True

  def __repr__(self):
    return self.to_csv()

The __init__ method creates class attributes that will store our data. There are two string constants that will be used later to format dates and times consistently.
The append method takes a python datetime object and the OHLCV values and appends them to the end of the lists created above.
The csv methods should be self explanatory. They convert the python lists to csv format and vice versa. They also handle reading and writing to data to disk for permanent storage.
For those of you new to python, the __repr__ function returns a printable representation of an object.

Now it is time to subclass the Quote class and customize it to download our Yahoo data.

class YahooQuote(Quote):
  ''' Daily quotes from Yahoo. Date format='yyyy-mm-dd' '''
  def __init__(self,symbol,start_date,end_date=datetime.date.today().isoformat()):
    super(YahooQuote,self).__init__()
    self.symbol = symbol.upper()
    start_year,start_month,start_day = start_date.split('-')
    start_month = str(int(start_month)-1)
    end_year,end_month,end_day = end_date.split('-')
    end_month = str(int(end_month)-1)
    url_string = "http://ichart.finance.yahoo.com/table.csv?s={0}".format(symbol)
    url_string += "&a={0}&b={1}&c={2}".format(start_month,start_day,start_year)
    url_string += "&d={0}&e={1}&f={2}".format(end_month,end_day,end_year)
    csv = urllib.urlopen(url_string).readlines()
    csv.reverse()
    for bar in xrange(0,len(csv)-1):
      ds,open_,high,low,close,volume,adjc = csv[bar].rstrip().split(',')
      open_,high,low,close,adjc = [float(x) for x in [open_,high,low,close,adjc]]
      if close != adjc:
        factor = adjc/close
        open_,high,low,close = [x*factor for x in [open_,high,low,close]]
      dt = datetime.datetime.strptime(ds,'%Y-%m-%d')
      self.append(dt,open_,high,low,close,volume)

This class first calls the super class initializer, it then proceeds to build the proper url string using the given symbol and dates. After retrieving the csv data from Yahoo using the url, it reverses the array so we get it in the order we desire. Then it loops through each day and adjusts the OHLC prices if necessary (as determined by the adjusted close) and stores them in lists.

Next, we need a way to demonstrate and test the code when it is run from the command line:

if __name__ == '__main__':
  q = YahooQuote('aapl','2011-01-01')              # download year to date Apple data
  print q                                          # print it out
  q = YahooQuote('orcl','2011-02-01','2011-02-28') # download Oracle data for February 2011
  q.write_csv('orcl.csv')                          # save it to disk
  q = Quote()                                      # create a generic quote object
  q.read_csv('orcl.csv')                           # populate it with our previously saved data
  print q                                          # print it out

That’s it! There isn’t much there is there? That is mostly because of the power of Python. If there are any bits you don’t understand, they should be very easy to figure out with the Python docs. If all else fails, you can always ask me.

Finally, here is a link to the complete file: yahoo.py

  16 Responses to “Downloading Yahoo Finance historical data with Python”

  1. […] Downloading Yahoo Finance historical data with Python Share this:FacebookTwitterGoogle +1Like this:Like Loading… This entry was posted in 程式設計 and tagged Python, 程式設計. Bookmark the permalink. ← [轉貼] Skanect 1.0 Tutorial – Quick Self-Scan […]

  2. Good Morning Mark,

    My exposure to programming is limited to IBM 1401 Autocoder in the late 1960’s, a C++ class I took in 2007 and a little tinkering around with online tutorials. I’d like to learn Python as it appears to be simpler than other languages. Your tutorial is ideal as I also have an interest in trading and in things financial. Any chance you have, or will, be upgrading this tutorial to Python 3? Regardless, thanks for your contribution to guys like me who have a desire to learn something new and exciting.

    • Thanks for the kind comment Paul. At this time I do not have any plans to upgrade to Python 3. I have tons more content planned, I just have not had the time to update the blog in quite a while.

  3. Hi, Mark –

    thank you for sharing with us.
    I am trying to use the code in python 3.3.3. the initial code analysis seems accepted all but the xrange(). it says xrange() as an undefined name, I assume this is because of version incompatibility. any way to get it worked around?

  4. Hi, Mark –

    thank you for sharing with us.
    I am trying to use the code in python 3.3.3. the initial code analysis seems accepted all but the xrange(). it says xrange() as an undefined name, I assume this is because of version incompatibility. any way to get it worked around?
    TIA
    Yun

  5. First thank you so much for such a great tutorial. If I can pick your knowledge for a second. How would I make it so, Instead of enter 2 dates to get historical data. I want it the script to grab todays date, than make the 2nd date 60 days back. So every time I run it it grabs current date and last 2 months.

    Thank you once again.

    Tyler

  6. I am very new to Python. How can I change this to make it monthly stock prices instead of daily? I’d like to be able to grab the last 36 months worth of prices for a wide range of mutual funds.

    • You should be able to modify the script to add “&g=m” to the end of the url. For example, add:

      url_string += “&g=m”

      right after the last url_string statement. Likewise, “&g=w” should work for weekly. Good luck!

      • I tried switching that and it isn’t working. I’m trying to remove xxx_day where it is. This is what I’ve come up with. It results in a value error where there are too many values to unpack

        import urllib,time,datetime

        class Quote(object):

        DATE_FMT = ‘%Y-%m-%d’
        TIME_FMT = ‘%H:%M:%S’

        def __init__(self):
        self.symbol = ”
        self.date,self.time,self.open_,self.high,self.low,self.close,self.volume = ([] for _ in range(7))

        def append(self,dt,open_,high,low,close,volume):
        self.date.append(dt.date())
        self.time.append(dt.time())
        self.open_.append(float(open_))
        self.high.append(float(high))
        self.low.append(float(low))
        self.close.append(float(close))
        self.volume.append(int(volume))

        def to_csv(self):
        return ”.join([“{0},{1},{2},{3:.2f},{4:.2f},{5:.2f},{6:.2f},{7}\n”.format(self.symbol,
        self.date[bar].strftime(‘%Y-%m-%d’),self.time[bar].strftime(‘%H:%M:%S’),
        self.open_[bar],self.high[bar],self.low[bar],self.close[bar],self.volume[bar])
        for bar in xrange(len(self.close))])

        def write_csv(self,filename):
        with open(filename,’w’) as f:
        f.write(self.to_csv())

        def read_csv(self,filename):
        self.symbol = ”
        self.date,self.time,self.open_,self.high,self.low,self.close,self.volume = ([] for _ in range(7))
        for line in open(filename,’r’):
        symbol,ds,ts,open_,high,low,close,volume = line.rstrip().split(‘,’)
        self.symbol = symbol
        dt = datetime.datetime.strptime(ds+’ ‘+ts,self.DATE_FMT+’ ‘+self.TIME_FMT)
        self.append(dt,open_,high,low,close,volume)
        return True

        def __repr__(self):
        return self.to_csv()

        class YahooQuote(Quote):
        ”’ Daily quotes from Yahoo. Date format=’yyyy-mm-dd’ ”’
        def __init__(self,symbol,start_date,end_date=datetime.date.today().isoformat()):
        super(YahooQuote,self).__init__()
        self.symbol = symbol.upper()
        start_year,start_month = start_date.split(‘-‘)
        start_month = str(int(start_month)-1)
        end_year,end_month = end_date.split(‘-‘)
        end_month = str(int(end_month)-1)
        url_string = “http://ichart.finance.yahoo.com/table.csv?s={0}”.format(symbol)
        url_string += “&a={0}&b={1}”.format(start_month,start_year)
        url_string += “&m={0}&f={1}”.format(end_month,end_year)
        csv = urllib.urlopen(url_string).readlines()
        csv.reverse()
        for bar in xrange(0,len(csv)-1):
        ds,open_,high,low,close,volume,adjc = csv[bar].rstrip().split(‘,’)
        open_,high,low,close,adjc = [float(x) for x in [open_,high,low,close,adjc]]
        if close != adjc:
        factor = adjc/close
        open_,high,low,close = [x*factor for x in [open_,high,low,close]]
        dt = datetime.datetime.strptime(ds,’%Y-%m-%d’)
        self.append(dt,open_,high,low,close,volume)

        if __name__ == ‘__main__’:
        q = YahooQuote(‘aapl’,’2011-01′) # download year to date Apple data
        print q # print it out
        q = YahooQuote(‘tsla’,’2011-01′,’2011-28′) # download Oracle data for February 2011
        q.write_csv(‘tsla.csv’) # save it to disk
        q = Quote() # create a generic quote object
        q.read_csv(‘tsla.csv’) # populate it with our previously saved data
        print q # print it out

        • You still need to pass in a full date. If you add the line suggested above like:


          url_string = “http://ichart.finance.yahoo.com/table.csv?s={0}”.format(symbol)
          url_string += “&a={0}&b={1}”.format(start_month,start_year)
          url_string += “&m={0}&f={1}”.format(end_month,end_year)
          url_string += “&g=m”

          and then call it like:

          q = YahooQuote(‘aapl’,’2011-01-01′)

          it should work.

          • This now gives me an error of an unexpected indent on the url_string += “&g=m”. Thank you for helping me work through this by the way.

          • Python is whitespace sensitive. You can use either tabs or spaces, but you must be consistent in how you use them. I suggest looking up some basic python tutorials. There are many excellent ones available for free.

          • I understand the whitespace attribute for python. I’m just not understanding why when I add the url_string += “&g=m” line of code with the same indent as the three previous lines it gives me that error.

  7. Thank you very much for your sharing. It seems that we still do not have full intraday data from both Yahoo and Google.

    I am doing some data mining on stock and currency market which requires exact opening and closing time for each day. I can deal with stock (fetched from Yahoo) by identifying opening and closing time of each stock exchange with corresponding time zone + DST.

    However, I cannot do this with currency. For example in https://finance.yahoo.com/q?s=EUR=X, we get Prev Close: 0.8806 and Open: 0.8806, I do not know the exact time for opening and closing for these two numbers, I also do not know from which currency exchange in the world they was obtained, (while forex are traded 24/(5+weekdays) as I know).

    Could you please give me some advice? Thank you so much.

 Leave a Reply

(required)

(required)