Dec 032011

In the previous post, we learned to manually build a url to download historical data for any stock symbol between any two dates. Now we want to build a set of simple Python classes to retrieve the data and store it in a consistent manner.

However, there is one more thing to figure out. Some of you may have noticed that there is an extra field (Adj Close) in addition to the normal Open/High/Low/Close/Volume data associated with a daily stock price. It turns out that stock prices can be adjusted for either dividends paid out or stock splits. For example, if a stock is trading at $500, a company board may decide to create 5 shares of $100 stock for each original $500 share. That way it will be easier for some investors to buy the stock. The day after this happens, the quoted price will drop from $500 to $100, but the company will still be worth the same (with five times as many shares). Yahoo data includes the actual OHLC prices on the date they were published, as well as an Adjusted Close price. Normally for the most recent prices, the close price is the same as the adjusted close price and nothing needs to be done. But for data that was published before the most recent split or dividend, the two prices will be different. In order to get a consistent representation of the price for those days, the OHLC prices must be multiplied by the adjusted close divided by the close for that day. You will end up with a new set of prices for that day that were different than what was published on that day, but be proportionally accurate relative to today’s prices. This is essential to back test trading systems accurately.

Now we can finally start writing some code! Let’s think about what we want to accomplish.

  • We want to save the symbol name
  • We want to save the date and time. (Time is not required for daily quotes, but we will be adding an intraday source later.)
  • We want to save the Open, High, Low, Close, and Volume (OHLCV) values for each day.
  • We want to save the OHLCV as Lists (arrays) in memory that are easy to manipulate.
  • We want the ability to save the data locally in comma separated files.
  • We want to be able to read our local files back into memory.
  • We want consistent date/time formatting.
  • We want the prices ordered from oldest to newest to make it easy on our later calculations.

First we will create a base class that contains all of the common functions we require.

class Quote(object):
  DATE_FMT = '%Y-%m-%d'
  TIME_FMT = '%H:%M:%S'
  def __init__(self):
    self.symbol = '',self.time,self.open_,self.high,self.low,self.close,self.volume = ([] for _ in range(7))

  def append(self,dt,open_,high,low,close,volume):
  def to_csv(self):
    return ''.join(["{0},{1},{2},{3:.2f},{4:.2f},{5:.2f},{6:.2f},{7}\n".format(self.symbol,
              for bar in xrange(len(self.close))])
  def write_csv(self,filename):
    with open(filename,'w') as f:
  def read_csv(self,filename):
    self.symbol = '',self.time,self.open_,self.high,self.low,self.close,self.volume = ([] for _ in range(7))
    for line in open(filename,'r'):
      symbol,ds,ts,open_,high,low,close,volume = line.rstrip().split(',')
      self.symbol = symbol
      dt = datetime.datetime.strptime(ds+' '+ts,self.DATE_FMT+' '+self.TIME_FMT)
    return True

  def __repr__(self):
    return self.to_csv()

The __init__ method creates class attributes that will store our data. There are two string constants that will be used later to format dates and times consistently.
The append method takes a python datetime object and the OHLCV values and appends them to the end of the lists created above.
The csv methods should be self explanatory. They convert the python lists to csv format and vice versa. They also handle reading and writing to data to disk for permanent storage.
For those of you new to python, the __repr__ function returns a printable representation of an object.

Now it is time to subclass the Quote class and customize it to download our Yahoo data.

class YahooQuote(Quote):
  ''' Daily quotes from Yahoo. Date format='yyyy-mm-dd' '''
  def __init__(self,symbol,start_date,
    self.symbol = symbol.upper()
    start_year,start_month,start_day = start_date.split('-')
    start_month = str(int(start_month)-1)
    end_year,end_month,end_day = end_date.split('-')
    end_month = str(int(end_month)-1)
    url_string = "{0}".format(symbol)
    url_string += "&a={0}&b={1}&c={2}".format(start_month,start_day,start_year)
    url_string += "&d={0}&e={1}&f={2}".format(end_month,end_day,end_year)
    csv = urllib.urlopen(url_string).readlines()
    for bar in xrange(0,len(csv)-1):
      ds,open_,high,low,close,volume,adjc = csv[bar].rstrip().split(',')
      open_,high,low,close,adjc = [float(x) for x in [open_,high,low,close,adjc]]
      if close != adjc:
        factor = adjc/close
        open_,high,low,close = [x*factor for x in [open_,high,low,close]]
      dt = datetime.datetime.strptime(ds,'%Y-%m-%d')

This class first calls the super class initializer, it then proceeds to build the proper url string using the given symbol and dates. After retrieving the csv data from Yahoo using the url, it reverses the array so we get it in the order we desire. Then it loops through each day and adjusts the OHLC prices if necessary (as determined by the adjusted close) and stores them in lists.

Next, we need a way to demonstrate and test the code when it is run from the command line:

if __name__ == '__main__':
  q = YahooQuote('aapl','2011-01-01')              # download year to date Apple data
  print q                                          # print it out
  q = YahooQuote('orcl','2011-02-01','2011-02-28') # download Oracle data for February 2011
  q.write_csv('orcl.csv')                          # save it to disk
  q = Quote()                                      # create a generic quote object
  q.read_csv('orcl.csv')                           # populate it with our previously saved data
  print q                                          # print it out

That’s it! There isn’t much there is there? That is mostly because of the power of Python. If there are any bits you don’t understand, they should be very easy to figure out with the Python docs. If all else fails, you can always ask me.

Finally, here is a link to the complete file:

Dec 032011

In order to create and run a trading system, the first thing you need is some historical data to work with. Fortunately, there are several excellent free sources available on the internet. Probably the most popular is from the Yahoo Finance site.

Normally, in order to access the data, you would enter in a stock symbol and pull up the overview page for a company. For example, if we pull up the page for Apple Inc., you will see a link on the left side for Historical Prices. Clicking on that link will bring up a page with a nicely formatted table of the most recent prices for Apple. At the bottom of that page, there is a link to download all the prices as a comma separated file that is suitable for importing into a spreadsheet. The downloaded file is formatted like this:

Date,Open,High,Low,Close,Volume,Adj Close

The data is very useful, but it is not very easy to download and manage lots of different symbols for different date ranges.

Suppose we wanted to automate this process? If we could figure out the url format, it would allow us to automatically generate new url’s and download the data via a Python script.

Let’s start by examining the url that generates the download data. In this example, it is from September 7th, 1984 to December 3rd, 2011:

It looks pretty easy to decipher. There is the main part of the url, followed by a number of query options. They are:

s=AAPL, d=11, e=3, f=2011, g=d, a=8, b=7, c=1984, ignore=.csv

  • s is obviously the symbol
  • d appears to be the end month minus one
  • e is the end day
  • f is the end year
  • g Not sure. Perhaps d is for daily?
  • a appears to be the start month minus one
  • b is the start day
  • c is the start year
  • ignore=.csv appears to specify csv as the output format

If we play around and build some new urls by hand by replacing the query values with new data and test them out, you find that we can leave the g and ignore options out and the downloaded files are identical.

So, now we know how to build a url to download historical data for any stock, and that’s all we need are the stock symbol and the start/end dates.

In the next post, we will create a set of Python classes to easily download and store the data in a consistent way that will be useful for our trading system.

Dec 032011

Here is a rough idea of what I have planned for upcoming posts. I intend to start by developing a very simple stock market trading system from scratch, implemented in Python. It will have the following components:

  • Accessing free historical data from Yahoo and Google (daily and intraday!)
  • Storing and retrieving the data in files using a common format
  • Simple indicators from scratch (such as moving averages, atr, etc.)
  • A very simple trading system that has various order types and accounts for slippage
  • Simple trading system logs and statistics (%wins/losses, drawdown, expectancy, etc.)
  • A Python interface to a complete indicator package (TA-LIB)

After the basic system is developed, I will demonstrate how to implement very simple Genetic Algorithms and Neural Networks and use them to optimize trading systems. While discussing Genetic Algorithms, I will cover many different fitness functions that can be used to test if the algorithms are getting the answers they should be.

I will then explore some topics that are extremely interesting to me at the moment: Genetic Programming and Gene Expression Programming. These techniques can be used to automatically discover new trading systems, new indicators, or to do time series prediction. I think you will be amazed by how powerful these algorithms are and how simple it can be to implement them.

There are a number of other things that I may touch on including: Analyzing market breadth, the Forex market, various forms of 2d and 3d visualization, book reviews, interesting academic papers, or anything else market related. Obviously, I have a wide range of interests, and I tend to get bored easily. So please bear with me.

As always, any and all feedback will be appreciated.