Mark Chenoweth

Dec 192011

So far, in my previous posts, we have been downloading DAILY stock prices from the well known sites Yahoo and Google. Both sites offer very easy access to free daily historical data. Both sites also offer nice charts of the stock prices. But some of you may have noticed that Google charts not only daily prices, but they also chart INTRADAY prices down to two minute resolution. However, there are no published links to get the raw data used to build the charts. Bummer. Let’s do a little investigating and see what we can figure out.

For those of you using the excellent Chrome browser, it has a built in utility to examine virtually all aspects of a web page (or site). It is called Developer Tools. It can be started clicking on the wrench icon in the upper right corner, selecting Tools, then Developer Tools (or simply Ctrl+Shift+I). This utility has an enormous amount of functionality, but we are mainly interested in any network traffic a page generates. So lets start by pulling up the Google Finance summary page for AAPL. Then start the developer tools and click on the Network tab. Finally, on the chart for AAPL, click on the 2min interval link. At the bottom of the page will be a list of all the network requests that were generated by clicking on the 2min link. Looking at them, there is one called getprices. That looks promising! Now click on that link and then the “response” tab along the top to see what the server response was to the request. Here is what my browser looked like:

The data we are interested in looks like this:


Now we are getting somewhere! That sure looks like price and volume data, along with some sort of time stamp and a list of offsets. Here is what the link looked like:,c,v,o,h,l&df=cpct&auto=1&ts=1324323553905

If we copy that link and put it directly in the url bar of our browser, we can get the full data set. It appears to have some header data followed by a list of prices. The dates appear to consist of a periodic time value followed by a series of offsets.

So there are two things to figure out. First we need to decipher the url string just like we did for the daily data, and then we need to decode the time stamp data in the returned results.

Let’s tackle the url string first. We will do it just like the other two previous attempts. We end up with:

  • The base url is
  • q is the symbol (AAPL)
  • x is the exchange (NASD)
  • i is the interval in seconds (120 = seconds = 2 minutes)
  • sessions is the session requested (ext_hours)
  • p is the time period (5d = 5 days)
  • f is the requested fields (d,c,v,o,h,l)
  • df ?? (cpct)
  • auto ?? (1)
  • ts is potentially a time stamp (1324323553 905)

Again, I spent some time playing around with the query parameters and found that you can get away with only q, i, p, and f. A simplified url (to get 10 days of 5 minute AAPL data) looks like this:,o,h,l,c,v

Go ahead and click on the link to try it. Now we can download intraday data for a specific period back in time. There may be a way to get it between any two dates, but I haven’t figured that out yet. If anyone knows, I sure would be interested in finding out how.

The next problem is to figure out how to deal with the weird date formats. In data returned from the example above, I got “a1323181800” in the first date field and moving down, the last “a” prefixed number displayed was “a1324045800”. Those look suspiciously like Unix time stamps to me. All major programming languages support them and they are simply the number of seconds since 1/1/1970. Again, using my friend Google, I found an online Unix time stamp converter and plugged in the above numbers (without the “a”). I got: “Tue, 06 Dec 2011 14:30:00 GMT” and “Fri, 16 Dec 2011 14:30:00 GMT” for the second. Bingo! They are Unix time stamps. And the data appears to be listed with the oldest first. So, to get the date for each time stamp, we just need to convert any “a” prefixed times to the Unix time, then for each succeeding time, we add the offset multiplied by the number of interval seconds to the last recorded Unix time until we get an new “a” prefixed time. It sounds a lot more complicated than it really is.

Since I am only interested in the DOHLC values, I am going to ignore the header fields in our dataset. And since this post is getting pretty long, I will present the Python class I wrote to parse the data:

class GoogleIntradayQuote(Quote):
  ''' Intraday quotes from Google. Specify interval seconds and number of days '''
  def __init__(self,symbol,interval_seconds=300,num_days=5):
    self.symbol = symbol.upper()
    url_string = "{0}".format(self.symbol)
    url_string += "&i={0}&p={1}d&f=d,o,h,l,c,v".format(interval_seconds,num_days)
    csv = urllib.urlopen(url_string).readlines()
    for bar in xrange(7,len(csv)):
      if csv[bar].count(',')!=5: continue
      offset,close,high,low,open_,volume = csv[bar].split(',')
      if offset[0]=='a':
        day = float(offset[1:])
        offset = 0
        offset = float(offset)
      open_,high,low,close = [float(x) for x in [open_,high,low,close]]
      dt = datetime.datetime.fromtimestamp(day+(interval_seconds*offset))

It should be pretty self explanatory. The main difference between this class and the others is that is takes an interval (in seconds) and a lookback (in days) instead of a start/end time. I ignore the first five header fields, then check for date fields prefixed with an “a”. I keep track of the offset and calculate the correct date with the line:

dt = datetime.datetime.fromtimestamp(day+(interval_seconds*offset))

There is one last tricky part: Notice that the OHLC data is listed in a non-standard format. It actually is returned as CHLO, no matter what order the f fields are listed.

It should be noted that it appears that there is only the most recent 10-20 days of intraday data available at any given time. Buy hey, what do you want for nothing! Later on, we will try to find some interesting things to do with this data and these limitations.

And finally, a complete example:

Dec 192011

Now that we have a very simple function to download historical stock data from Yahoo, let’s see if we can use the same technique to download data from Google Finance.

If we head over to the main page for Google finance and pull up a quote for AAPL, we will see a link for historical prices on the upper left corner. Clicking on that link will bring up the historical prices page. By default, the page shows a nicely formatted table of prices with the latest price at the top of the page. Notice that there is an option to pick any two date ranges for the prices, and, just like Yahoo, there is a link to download the data to a spreadsheet. Now we pick a date range (in this case, Nov 1st – Nov 30th) and click the update button. Then we can download the data using the link on the right side of the page. By default it downloads as “data.csv”. We end up with the following data:


This data looks very similar to the Yahoo data, but without the adjusted close. It appears that Google adjusts the prices for us, so we will not have to. Also, notice the dates are formatted differently. It should be very easy to convert to our standard format that we defined in our Quote class in the previous post.

Now we need to decipher the link used to get the data. We copy the link by right clicking it and selecting “Copy link address” and end up with the following:

We have the query options: cid, startdate, enddate, num, and output. Take a look at the date values. What’s up with the percent signs? Anytime you see “%” signs in a url, it means that it has been url encoded. By searching google, we can easily find many sites that will encode and decode url strings. By decoding the raw url string, we end up with: 1, 2010&enddate=Nov 30, 2011&num=30&output=csv

That looks much easier to figure out.

We now have:

  • cid = 22144
  • startdate = Nov 1, 2011
  • enddate = Nov 30, 2011
  • num = 30
  • output = csv

There is only one problem. What is the cid? Probably some sort of database id. But by observing the query string for the main page, we can see that it uses a “q” option that takes the exchange and symbol name. By playing around with different combinations of these parameters, I found that we can use the q option without the exchange, and the num option does not seem to be necessary. We can simplify the url string to: 1, 2011&enddate=Nov 30, 2011&output=csv

The only thing left to do now is to subclass our original Quote class and customize it to build our url string and convert the dates to our common format. Here is what I ended up with:

class GoogleQuote(Quote):
  ''' Daily quotes from Google. Date format='yyyy-mm-dd' '''
  def __init__(self,symbol,start_date,
    self.symbol = symbol.upper()
    start =[0:4]),int(start_date[5:7]),int(start_date[8:10]))
    end =[0:4]),int(end_date[5:7]),int(end_date[8:10]))
    url_string = "{0}".format(self.symbol)
    url_string += "&startdate={0}&enddate={1}&output=csv".format(
                      start.strftime('%b %d, %Y'),end.strftime('%b %d, %Y'))
    csv = urllib.urlopen(url_string).readlines()
    for bar in xrange(0,len(csv)-1):
      ds,open_,high,low,close,volume = csv[bar].rstrip().split(',')
      open_,high,low,close = [float(x) for x in [open_,high,low,close]]
      dt = datetime.datetime.strptime(ds,'%d-%b-%y')

This one is even simpler than the Yahoo version. There is no need to calculate adjusted prices. The main difference is the different url format and time formatting.

You can download a file with the Quote class and a few command prompt tests here:

In a future post, we will consolidate all of these classes and create a Python module/command line utility to download historical data from these various sources save them to disk in a consistent format. Stay tuned!

Dec 032011

In the previous post, we learned to manually build a url to download historical data for any stock symbol between any two dates. Now we want to build a set of simple Python classes to retrieve the data and store it in a consistent manner.

However, there is one more thing to figure out. Some of you may have noticed that there is an extra field (Adj Close) in addition to the normal Open/High/Low/Close/Volume data associated with a daily stock price. It turns out that stock prices can be adjusted for either dividends paid out or stock splits. For example, if a stock is trading at $500, a company board may decide to create 5 shares of $100 stock for each original $500 share. That way it will be easier for some investors to buy the stock. The day after this happens, the quoted price will drop from $500 to $100, but the company will still be worth the same (with five times as many shares). Yahoo data includes the actual OHLC prices on the date they were published, as well as an Adjusted Close price. Normally for the most recent prices, the close price is the same as the adjusted close price and nothing needs to be done. But for data that was published before the most recent split or dividend, the two prices will be different. In order to get a consistent representation of the price for those days, the OHLC prices must be multiplied by the adjusted close divided by the close for that day. You will end up with a new set of prices for that day that were different than what was published on that day, but be proportionally accurate relative to today’s prices. This is essential to back test trading systems accurately.

Now we can finally start writing some code! Let’s think about what we want to accomplish.

  • We want to save the symbol name
  • We want to save the date and time. (Time is not required for daily quotes, but we will be adding an intraday source later.)
  • We want to save the Open, High, Low, Close, and Volume (OHLCV) values for each day.
  • We want to save the OHLCV as Lists (arrays) in memory that are easy to manipulate.
  • We want the ability to save the data locally in comma separated files.
  • We want to be able to read our local files back into memory.
  • We want consistent date/time formatting.
  • We want the prices ordered from oldest to newest to make it easy on our later calculations.

First we will create a base class that contains all of the common functions we require.

class Quote(object):
  DATE_FMT = '%Y-%m-%d'
  TIME_FMT = '%H:%M:%S'
  def __init__(self):
    self.symbol = '',self.time,self.open_,self.high,self.low,self.close,self.volume = ([] for _ in range(7))

  def append(self,dt,open_,high,low,close,volume):
  def to_csv(self):
    return ''.join(["{0},{1},{2},{3:.2f},{4:.2f},{5:.2f},{6:.2f},{7}\n".format(self.symbol,
              for bar in xrange(len(self.close))])
  def write_csv(self,filename):
    with open(filename,'w') as f:
  def read_csv(self,filename):
    self.symbol = '',self.time,self.open_,self.high,self.low,self.close,self.volume = ([] for _ in range(7))
    for line in open(filename,'r'):
      symbol,ds,ts,open_,high,low,close,volume = line.rstrip().split(',')
      self.symbol = symbol
      dt = datetime.datetime.strptime(ds+' '+ts,self.DATE_FMT+' '+self.TIME_FMT)
    return True

  def __repr__(self):
    return self.to_csv()

The __init__ method creates class attributes that will store our data. There are two string constants that will be used later to format dates and times consistently.
The append method takes a python datetime object and the OHLCV values and appends them to the end of the lists created above.
The csv methods should be self explanatory. They convert the python lists to csv format and vice versa. They also handle reading and writing to data to disk for permanent storage.
For those of you new to python, the __repr__ function returns a printable representation of an object.

Now it is time to subclass the Quote class and customize it to download our Yahoo data.

class YahooQuote(Quote):
  ''' Daily quotes from Yahoo. Date format='yyyy-mm-dd' '''
  def __init__(self,symbol,start_date,
    self.symbol = symbol.upper()
    start_year,start_month,start_day = start_date.split('-')
    start_month = str(int(start_month)-1)
    end_year,end_month,end_day = end_date.split('-')
    end_month = str(int(end_month)-1)
    url_string = "{0}".format(symbol)
    url_string += "&a={0}&b={1}&c={2}".format(start_month,start_day,start_year)
    url_string += "&d={0}&e={1}&f={2}".format(end_month,end_day,end_year)
    csv = urllib.urlopen(url_string).readlines()
    for bar in xrange(0,len(csv)-1):
      ds,open_,high,low,close,volume,adjc = csv[bar].rstrip().split(',')
      open_,high,low,close,adjc = [float(x) for x in [open_,high,low,close,adjc]]
      if close != adjc:
        factor = adjc/close
        open_,high,low,close = [x*factor for x in [open_,high,low,close]]
      dt = datetime.datetime.strptime(ds,'%Y-%m-%d')

This class first calls the super class initializer, it then proceeds to build the proper url string using the given symbol and dates. After retrieving the csv data from Yahoo using the url, it reverses the array so we get it in the order we desire. Then it loops through each day and adjusts the OHLC prices if necessary (as determined by the adjusted close) and stores them in lists.

Next, we need a way to demonstrate and test the code when it is run from the command line:

if __name__ == '__main__':
  q = YahooQuote('aapl','2011-01-01')              # download year to date Apple data
  print q                                          # print it out
  q = YahooQuote('orcl','2011-02-01','2011-02-28') # download Oracle data for February 2011
  q.write_csv('orcl.csv')                          # save it to disk
  q = Quote()                                      # create a generic quote object
  q.read_csv('orcl.csv')                           # populate it with our previously saved data
  print q                                          # print it out

That’s it! There isn’t much there is there? That is mostly because of the power of Python. If there are any bits you don’t understand, they should be very easy to figure out with the Python docs. If all else fails, you can always ask me.

Finally, here is a link to the complete file:

Dec 032011

In order to create and run a trading system, the first thing you need is some historical data to work with. Fortunately, there are several excellent free sources available on the internet. Probably the most popular is from the Yahoo Finance site.

Normally, in order to access the data, you would enter in a stock symbol and pull up the overview page for a company. For example, if we pull up the page for Apple Inc., you will see a link on the left side for Historical Prices. Clicking on that link will bring up a page with a nicely formatted table of the most recent prices for Apple. At the bottom of that page, there is a link to download all the prices as a comma separated file that is suitable for importing into a spreadsheet. The downloaded file is formatted like this:

Date,Open,High,Low,Close,Volume,Adj Close

The data is very useful, but it is not very easy to download and manage lots of different symbols for different date ranges.

Suppose we wanted to automate this process? If we could figure out the url format, it would allow us to automatically generate new url’s and download the data via a Python script.

Let’s start by examining the url that generates the download data. In this example, it is from September 7th, 1984 to December 3rd, 2011:

It looks pretty easy to decipher. There is the main part of the url, followed by a number of query options. They are:

s=AAPL, d=11, e=3, f=2011, g=d, a=8, b=7, c=1984, ignore=.csv

  • s is obviously the symbol
  • d appears to be the end month minus one
  • e is the end day
  • f is the end year
  • g Not sure. Perhaps d is for daily?
  • a appears to be the start month minus one
  • b is the start day
  • c is the start year
  • ignore=.csv appears to specify csv as the output format

If we play around and build some new urls by hand by replacing the query values with new data and test them out, you find that we can leave the g and ignore options out and the downloaded files are identical.

So, now we know how to build a url to download historical data for any stock, and that’s all we need are the stock symbol and the start/end dates.

In the next post, we will create a set of Python classes to easily download and store the data in a consistent way that will be useful for our trading system.

Dec 032011

Here is a rough idea of what I have planned for upcoming posts. I intend to start by developing a very simple stock market trading system from scratch, implemented in Python. It will have the following components:

  • Accessing free historical data from Yahoo and Google (daily and intraday!)
  • Storing and retrieving the data in files using a common format
  • Simple indicators from scratch (such as moving averages, atr, etc.)
  • A very simple trading system that has various order types and accounts for slippage
  • Simple trading system logs and statistics (%wins/losses, drawdown, expectancy, etc.)
  • A Python interface to a complete indicator package (TA-LIB)

After the basic system is developed, I will demonstrate how to implement very simple Genetic Algorithms and Neural Networks and use them to optimize trading systems. While discussing Genetic Algorithms, I will cover many different fitness functions that can be used to test if the algorithms are getting the answers they should be.

I will then explore some topics that are extremely interesting to me at the moment: Genetic Programming and Gene Expression Programming. These techniques can be used to automatically discover new trading systems, new indicators, or to do time series prediction. I think you will be amazed by how powerful these algorithms are and how simple it can be to implement them.

There are a number of other things that I may touch on including: Analyzing market breadth, the Forex market, various forms of 2d and 3d visualization, book reviews, interesting academic papers, or anything else market related. Obviously, I have a wide range of interests, and I tend to get bored easily. So please bear with me.

As always, any and all feedback will be appreciated.

Dec 032011

Welcome to my latest project! In this blog I intend to create a series of posts and tutorials on some of my favorite subjects: Trading Systems, Intelligent Algorithms, and Visualization. I have been programming computers for over 30 years (anyone remember the VIC-20?) and am obsessed with learning the latest technologies and applying them to what interests me.

I fully realize that there are many excellent existing projects out there that cover the topics I plan on covering, and I have learned much from them. However, I believe in order to fully understand something, you must do it yourself from scratch. I also believe in simplicity. Many projects are extremely large and complex and have teams of people developing them. While there is certainly a need for projects such as those, it can be extremely difficult to learn about the fundamentals of a topic from such a code base.

Therefore, the code I will be posting (to paraphrase Einstein) will be as simple as possible, but not simpler. It is intended to be short, easy to understand, and a starting point for your own customization’s and explorations. It will represent just one of many ways to accomplish the same task. Of course, I am also very interested in any feedback, bug fixes, or improvements that you may have.

While the code I write may look simple, it is a function of the tools I use. These days there are many very high quality programming languages and utilities that can do an enormous amount of work in very little code. When someone writes a trading system from scratch, the result will be very different when implemented in C vs. Python. I prefer to use to the best tool for the job at hand, so you will see a variety of languages and tools used. Currently, Python is my favorite general purpose language, although Ruby is a close second. All of the languages and utilities I use are free, open source, and run on multiple platforms (at least OSX, Linux, and Windows). Most code will work with a default installation. However, sometimes a library may need to be installed. I will note when that is necessary.

One last note. While I strive to write simple code, I am going to focus on the algorithms involved and will assume that my readers have at least a basic knowledge of computer programming. If you are completely new to programming, there are many other excellent tutorials out there to get you started. Google is your friend.