Stochastic differential equations provide a rich class of flexible generative models,-capable of describing a wide range of spatio-temporal processes. A host of recent-work looks to learn data-representing SDEs, using neural networks and other-flexible function approximators. Despite these advances, learning remains computationally-expensive due to the sequential nature of SDE integrators. In this work, we propose an importance-sampling estimator for probabilities of observations of-SDEs for the purposes of learning. Crucially, the approach we suggest does not-rely on such integrators. The proposed method produces lower-variance gradient-estimates compared to algorithms based on SDE integrators and has the added-advantage of being embarrassingly parallelizable. This facilitates the effective use-of large-scale parallel hardware for massive decreases in computation time.