The Azimuth Project
Personal- DavidTweed- Ideas for simulation (changes)

Showing changes from revision #2 to #3: Added | Removed | Changed

Longer-term ideas for simulation:

Problem: “visualising” high-dimensional spaces is difficult, and it’s often something that gets stumbled upon in visualisation process that gets investigated further. Would be interesting to try and be more systematic. In particular, it’s often the case that people doing exploratory stuff try various things until they spot some possible pattern “by eye”, then devote great amounts of work to trying to support this (cf, power laws). Potentially this is biased by the order in which they look at things and what catches their eye enough to embark upon. Looking more widely for crude patterns may reduce some of this bias.

Idea:

Problem independent:

  1. Have a large library of simple-ish statistical functions, eg, linear, gaussian, log-normal, etc.

  2. Use automatic differentiation to obtain their derivatives for semi-efficient fitting.

This “library of trial functions” creation can be semi-automated using automated expression enumerator. Code partially done, but it’s looking a bit tricky to generate expressions in a way that doesn’t generate disguised versions of the same function (eg, exp(ax+b)\exp (a x + b) vs cexp(ax)c \exp (a x) ). . Actually, these aren’t exactly equivalent ifbb is restricted to be real, but allowing complex values opens up a whole can of worms.).

Problem specific:

  1. Run simulation to generate reasonable estimates of what happens.

  2. Pick random subset of variables and project data onto it.

  3. Try and use simple factor analysis (eg, Eckart-Young low rank approximation theorem) to see if data is (very close to) a separable function. If so, then can consider each separately. Otherwise need to try more complicated functions.

  4. Try and fit variety of statistical functions from library (using derivatives in non-linear least squares fitting or something more involved).

  5. Try again from 2.

  6. After certain number of random trials, report the closest/most simple fits (in some sense) to user for more manual investigation.

This is similar-ish to some other ideas, eg, Eureqa.

Personal- David Tweed- Type systems for scientific computation

category: personal