Friday, May 31, 2013

Visualizations, video/audio, and ML for time series data: which platform?

I want interactive visualizations of a bunch of time series, as well as some audio and video, scrub through them and keep them all synced, and then generate features from them and feed those into a machine learning model. No need to be a web app.

What environment do we do this in? Cross-platform choices seem to be javascript/browser, python/installed, and java/installed.

Visualization:
Javascript: d3 (gallery), cubismGoogle Charts, some others like rickshaw, even processing.js. This is maybe the best reason to pick JS.
Python: matplotlib - mostly static visualizations (gallery) or Bokeh? (see this post) Bokeh outputs to an html5 canvas in the future (or a Chaco plot currently).
Java: ...?

Playing audio/video:
Javascript in a browser: HTML5 video and audio?
Python/Java: beats me. Codec hell?

Machine learning:
Javascript: ...?
Python: scikit-learn (and others)
Java: Weka (I hear the API's a pain, though.)

The path forward seems to be to start building an HTML/JS app, even if it's only client side, and figure something out for the machine learning. Perhaps compile scikit-learn to JS with pyjs? Perhaps (this sounds kind of painful) just send all the features to a server and use weka or scikit-learn there to do the real ML and send back results? But I'd welcome any input.

2 comments:

  1. Is this a one-off for use on your own system, or are you distributing this business?

    If you're distributing, I agree with your path forward, I think. I feel like we've all been doing interactive graphics in browsers for a while now, so why switch? :-) Everyone's got a browser.

    I know nothing about machine learning, so can't speak there.

    ReplyDelete
  2. Yeah. It's for a little bit of distribution; send it to a few researchers who can tolerate a few bugs. But still, a browser could solve or prevent a lot of them. The only issue is that it has to be able to do all these things (visualizations, audio and video, and machine learning) so I want to make sure we don't hit big problems later.

    ReplyDelete