For code/output blocks: Use ``` (aka backtick or grave accent) in a single line before and after the block. See: http://commonmark.org/help/

Where to get OHLCV data on crypto currencies?



  • Anyone have tips for where to get good OHLCV candle data for backtesting on crypto currencies?

    Are there any commercial offerings on the market?



  • Depends on:

    • if you want just BTC or also main altcoins (ETH, XMR, Ripple, etc.) or as many altcoins as possible including the "esoteric longtail"
    • the timeframe(s) you want
    • how far in the past you want to go
    • if aggregated market prices are ok or you target a specific exchange

    (I'm kinda assuming "as much as possible" for the first 3 questions since that would be my preference and the more data the merrier)

    For commercial offerings you can search if any of the companies/projects listed here: https://www.quora.com/Where-can-I-get-historical-raw-data-on-Bitcoin-price-and-volume-across-different-exchanges are still live

    I'm personally using Binance API for 5min OHLCV data for all cryptos traded on Binance, that gives you ~200 pairs, it's free, Binance API offers timeframe down to 1min + even real streaming of trades (not worth it IMO) and for OHLCV I was able to download all the history way back till the dawn of time (which for Binance is July 2017 when they started :) )

    For longer historical data and even more currencies I'm using Coinmarketcap via the jhogan4288's python scraper (https://github.com/jhogan4288/coinmarketcap-history) - very convenient to get daily timeframe data.
    Plus Coinmarketcap API ticker updates every 5 minutes so if you start polling it with that frequency, you could in theory distile ~1hour OHLCV data out of that, but it will take some time until you accumulate reasonable history for backtesting. Plus the prices are aggregate from all the various exchanges that CMC looks at including the shady ones - remember the case couple months back when abruptly CMC removed South Korean exchanges from its list and it caused massive drop in the coin prices (since South Korea was trading with such a huge premium).

    I'm also collecting data from Bitstamp since I trade there but Bitstamp doesn't have proper OHLCV api so I ended up polling a realtime stream of trades and reconstructing the OHLCV out of that.

    Hope some of that helps :)

    T.



  • Thx a bunch for the pointers. Sounds very interesting poling Bitstamp, as im also trading there. Can you share the code you use to get data and construct candles on Bitstamp?



  • @søren-pallesen sure, the polling code is using the Python Pusher Client library from here: https://github.com/ekulyk/PythonPusherClient

    It's rather rough and unpolished since I've just drafted it together and it worked right off so have not really spent much time polishing it afterwards. But it has been running for 4 months now without a crash, which I consider almost surprising :)

    The script generates a new csv file every hour, writing one trade per line in the form <timestamp>;<trade price>;<trade amount>

    It actually polls two exchanges - Bitstamp and Czech exchange Coinmate.io so you might want to do some pruning to get rid of the latter...

    #!/usr/bin/env python
    
    import sys
    sys.path.append('..')
    
    import time
    import datetime
    import signal
    import pusherclient
    import json
    
    # Add a logging handler so we can see the raw communication data
    import logging
    root = logging.getLogger()
    root.setLevel(logging.INFO)
    ch = logging.StreamHandler(sys.stdout)
    root.addHandler(ch)
    
    outfiles = {}
    pushers = {}
    
    def channel_callback_generator(name, outfiles, formatter):
    	def callback(data):
    #		print("Received data: %s")%(repr(json.loads(data)))
    #		print("Channel Callback: %s" % formatter(json.loads(data)))
    		outfiles[name].write(formatter(json.loads(data))+'\n')
    	return callback
    
    def connect_handler_generator(name, pusher, channel, event, outfiles, formatter):
    	return lambda data: pusher.subscribe(channel).bind(event, channel_callback_generator(name, outfiles, formatter))
    
    def exit_handler(signal, frame):
    	for name, f in outfiles.items():
    		f.close()
    	sys.exit(1)
    
    coinmate_formatter = lambda data: "%d;%f;%f"%(data[0]['date']//1000,data[0]['price'],data[0]['amount'])
    bitstamp_formatter = lambda data: "%s;%f;%f"%(data['timestamp'],data['price'],data['amount'])
    
    filename_pattern_basic = 'trades/%(exchange)s/trades_%(exchange)s_%(pair)s_%(timestamp)s.csv'
    
    if __name__ == '__main__':
    	configs = {
    		'bitstamp_trades_btcusd': {
    			'exchange': 'bitstamp',
    			'pair': 'BTCUSD',
    			'appkey': 'de504dc5763aeef9ff52',
    			'channel': 'live_trades',
    			'event': 'trade',
    			'formatter': bitstamp_formatter,
    			'filename_pattern': filename_pattern_basic,
    		},
    		'bitstamp_trades_ltcusd': {
                            'exchange': 'bitstamp',
                            'pair': 'LTCUSD',
                            'appkey': 'de504dc5763aeef9ff52',
                            'channel': 'live_trades_ltcusd',
                            'event': 'trade',
                            'formatter': bitstamp_formatter,
                            'filename_pattern': filename_pattern_basic,
                    },
                    'bitstamp_trades_ltcbtc': {
                            'exchange': 'bitstamp',
                            'pair': 'LTCBTC',
                            'appkey': 'de504dc5763aeef9ff52',
                            'channel': 'live_trades_ltcbtc',
                            'event': 'trade',
                            'formatter': bitstamp_formatter,
                            'filename_pattern': filename_pattern_basic,
                    },
                    'bitstamp_trades_xrpusd': {
                            'exchange': 'bitstamp',
                            'pair': 'XRPUSD',
                            'appkey': 'de504dc5763aeef9ff52',
                            'channel': 'live_trades_xrpusd',
                            'event': 'trade',
                            'formatter': bitstamp_formatter,
                            'filename_pattern': filename_pattern_basic,
                    },
                    'bitstamp_trades_ethusd': {
                            'exchange': 'bitstamp',
                            'pair': 'ETHUSD',
                            'appkey': 'de504dc5763aeef9ff52',
                            'channel': 'live_trades_ethusd',
                            'event': 'trade',
                            'formatter': bitstamp_formatter,
                            'filename_pattern': filename_pattern_basic,
                    },
    		'coinmate_trades_btcczk': {
    			'exchange': 'coinmate',
    			'pair': 'BTCCZK',
    			'appkey': 'af76597b6b928970fbb0',
    			'channel': 'trades-BTC_CZK',
    			'event': 'new_trades',
    			'formatter': coinmate_formatter,
    			'filename_pattern': filename_pattern_basic,
    		},
                    'coinmate_trades_ltcbtc': {
                            'exchange': 'coinmate',
                            'pair': 'LTCBTC',
                            'appkey': 'af76597b6b928970fbb0',
                            'channel': 'trades-LTC_BTC',
                            'event': 'new_trades',
                            'formatter': coinmate_formatter,
                            'filename_pattern': filename_pattern_basic,
                    },
    
    	}
    
    	for name, conf in configs.items():
    		outfname = conf['filename_pattern']%({'exchange': conf['exchange'], 'pair': conf['pair'], 'timestamp': datetime.datetime.now().strftime("%Y%m%d-%H%M%S")})
    		f = open(outfname,'w')
    		outfiles[name] = f
    		p = pusherclient.Pusher(conf['appkey'])
    		p.connection.bind('pusher:connection_established', connect_handler_generator(
    									name=name,
    									pusher=p,
    									channel=conf['channel'],
    									event=conf['event'],
    									outfiles=outfiles,
    									formatter=conf['formatter']
    								)
    		)
    		p.connect()
    		pushers[name] = p
    
    	signal.signal(signal.SIGINT, exit_handler)
    
    	newfiles = False
    	while True:
    		time.sleep(1)
    		now = time.time()
    		if now % 3600 < 5 and not newfiles:
    			for name in outfiles.keys():
    				outfiles[name].close()
    				outfname = configs[name]['filename_pattern']%({'exchange': configs[name]['exchange'], 'pair': configs[name]['pair'], 'timestamp': datetime.datetime.now().strftime("%Y%m%d-%H%M%S")})
    				f = open(outfname,'w')
    				outfiles[name] = f
    			newfiles = True
    		elif now % 3600 > 6 and now % 3600 < 10:
    			newfiles = False
    		elif now % 60 < 1:
    			for name, f in outfiles.items():
    				f.flush()
    

    The script to convert trades to OHLCV data is even more rudimentary:

    import pandas as pd
    import datetime
    import sys
    import os
    import glob
    import logging
    
    logging.basicConfig(filemode='w', level=logging.DEBUG)
    logger = logging.getLogger(__name__)
    handler = logging.FileHandler(sys.argv[0][:-3]+'.log')
    handler.setLevel(logging.INFO)
    logger.addHandler(handler)
    
    def parsedate(timestamp):
    	if type(timestamp) is str: timestamp = int(timestamp) # put up with timestamp being in string format
    	if timestamp > 1e10: timestamp//1000 # put up with timestamp being in microseconds
    	return datetime.datetime.fromtimestamp(float(timestamp))
    
    source_folder = sys.argv[1] # folder where to look for trades_*.csv files containing trades
    target_folder = sys.argv[2] # folder where to output resulting OHLCV data files
    move_folder = sys.argv[3] # folder where to move processed trades_*.csv files (-d instead of a folder makes the script delete the source trade files)
    
    if __name__ == '__main__':
    	timeframes = ['5T','1H','4H','1D']
    
    	fileCount = 0
    	for file in glob.glob(os.path.join(source_folder,'trades_*.csv')):
    		logger.info('Processing file %s'%(file))
    		fileName = os.path.basename(file)
    		chunks = fileName.split('_')
    		exchange = chunks[1]
    		pair = chunks[2]
    		datapd = pd.read_csv(file, names=['t','price','amount'], index_col=0, parse_dates=True, sep=';', date_parser=parsedate)
    		fileCount += 1
    		for tf in timeframes:
    			outfileName = os.path.join(target_folder, '%s_%s_ohlc_%s.csv'%(exchange, pair, tf))
    			ohlcpd = datapd.resample(tf).agg({'price':'ohlc','amount':'sum'})
    			ohlcpd['exchange'] = exchange
    			ohlcpd['pair'] = pair
    			logger.debug(list(ohlcpd))
    			ohlcpd.to_csv(outfileName, header=False)
    		if move_folder is '-d': os.unlink(file)
    		else: os.rename(file, os.path.join(move_folder,fileName))
    		fileCount += 1
    	logger.info('Processed %d input files'%(fileCount))
    


  • i didn't like whatever basically any of the other libraries had going on, so i wrote my own (for gdax) with the below output:

    0_1524615991306_8ae9fcb2-6584-476b-b47b-52d7b47201c9-image.png

    messing around with websocket live data cached and aggregated @ 10s slices (with buy & sell vol).

    both of these are currently dumping to mongo, but i'm sure they can also be redirected live to pandas or something in a stream capacity.