High Performance Hardware for Distributed Deep Learning