How to draw a probable outcome from a distribution?

I have collected positional data. To visualize the data, I'd like to draw a 'typical' outcome of an experiment.

The data comes from a few hundred experiments, where I identify a variable number of objects at different positions relative to the origin in 2D. Thus, I can calculate the average number of objects, as well as estimate the empirical distribution of the objects. A plot of the 'typical' outcome would then have the average (or possibly mode) number of objects, say, 5. What I'm not sure about is where to position these 5 objects.

To simplify the problem, assume that the data follows a 2D normal distribution. If I were just to randomly draw 5 points from the distribution, I might get one point at [3,3], which would be a very rare outcome, and would thus not reflect the 'typical', or 'average' outcome. However, just drawing 5 points at [0,0] would also not make sense - even though [0,0] is the average position of the objects, 5 overlapping points are not an 'average' outcome of the process, either.

In other words, how can I get a 'likely' draw from a distribution?


It looks like I should mention why I don't want to use the usual methods (like a 2D smoothed histogram, or plotting all the many points) to look at the 2D distribution.

  1. The objects (which are vesicles (i.e. little spheres) inside cells) vary in number, size and position (distribution of the distance from the cell center, amount of clustering). I would like to display all these features in one graph. Since there are several hundred cells containing many vesicles each, it is not very useful to combine them all in a single plot. I am well aware that I could use a multipanel graph showing the distributions of all parameters, but this would be a lot less intuitive.
  2. I would like to show a 'typical' cell that shows all the salient features that characterize a specific phenotype. This way, if I want to image a particular phenotype in a mixed population, I know what kind of cell I'm looking for.
  3. I think such a plot would be a cool way to display a lot of information at once, and I just want to try.

Maybe it would be clearer If I said that I want to simulate a likely experimental result based on my measurements?

I also think that it's not clear what you want. But if you want a set of deterministically chosen points, so that they preserve the moments of the initial distribution, you can use the sigma point selection method that applies to the unscented Kalman filter.

Say that you want to select $2L+1$ points that fulfill those requirements. Then proceed in the following way:

$\mathcal{X}_0=\overline{x} \qquad w_0=\frac{\kappa}{L+\kappa} \qquad i=0$

$\mathcal{X}_i=\overline{x}+\left(\sqrt{(\:L+\kappa\:)\:\mathbf{P}_x}\right)_i \qquad w_i=\frac{1}{2(L+\kappa)} \qquad i=1, \dots,L$

$\mathcal{X}_i=\overline{x}-\left(\sqrt{(\:L+\kappa\:)\:\mathbf{P}_x}\right)_i \qquad w_i=\frac{1}{2(L+\kappa)} \qquad i=L+1, \dots,2L$

where $w_i$ the weight of the i-th point,

$\kappa=3-L$ (in case of Normally distributed data),

and $\left(\sqrt{(\:L+\kappa\:)\mathbf{P}_x}\right)_i$ is the i-th row (or column)* of the matrix square root of the weighted covariance $(\:L+\kappa\:)\:\mathbf{P}_x$ matrix (usually given by the Cholesky decomposition)

* If the matrix square root $\mathbf{A}$ gives the original by giving $\mathbf{A}^T\mathbf{A}$, then use the rows of $\mathbf{A}$. If it gives the original by giving $\mathbf{A}\mathbf{A}^T$, then use the columns of $\mathbf{A}$. The result of the matlab function chol() falls into the first category.

Here is a simple example using R

x <- rnorm(1000,5,2.5)
y <- rnorm(1000,2,1)

P <- cov(cbind(x,y))
V0 <- c(mean(x),mean(y))
n <- 2;k <- 1
A <- chol((n+k)*P) # matrix square root

points <-*n),function(i) if (i<=n) A[i,] + V0 else -A[i-n,] + V0))

#mean (equals V0)
1/(2*(n+k))*(V1+V2+V3+V4) + k/(n+k)*V0
#covariance (equals P)
1/(2*(n+k)) * ((V1-V0) %*% t(V1-V0) + (V2-V0) %*% t(V2-V0) + (V3-V0) %*% t(V3-V0) + (V4-V0) %*% t(V4-V0))

To summarise (please correct me if I'm wrong):

  • You have a set of points for a number of parameters/states.
  • The points provide a joint distribution of the parameters states
  • You want to simulate from a model using some typical states.

The problem you have is that you can't write down a nice closed form density.

To tackle this problem you should use a particle filter. Suppose your model of a cell was this simple ODE:

\begin{equation} \frac{dX(t)}{dt} = \lambda X(t) \end{equation}

and your data consists of values of $\lambda$ and $X(0)$. Put this data in a matrix with two columns and $n$ rows, where $n$ is the number of points. Then

  1. Choose a row at random, to get a particular values of $\lambda$ and $X(0)$
  2. Optional step: perturb your parameters with noise.
  3. Simulate from your model, in this case the ODE.
  4. Repeat as necessary.

The key point is that step 1 is draw from the joint density of the $\lambda$ and $X(0)$.

This answer could be way off if I've misinterpreted what you mean about simulating from the model. Please correct me if I'm wrong.

One thing that you could do is to plot the position of all your experiments in the 2D plane, one point for each object, maybe colored by experiment (if you have a lot of experiments you may just plot a random subset of them).

If there is a pattern in the position of the objects it should emerge when doing this.

Also, depending on what you are measuring, maybe is not the absolute position that counts but the relative position of the objects. In that case you could rotate the positions around the origin so that for each experiment the first point always lies, for instance, on the x axis.

Maybe you could use a smoothed scatterplot? It is an analogy to kernel density approximation, but in 2D.