I got an email from Numenta today, telling me that Dileep George and Jeff Hawkins had a paper, "Towards a Mathematical Theory of Cortical Micro-circuits", published in the PLoS Computational Biology journal. Part of the abstract states:
“we describe how Bayesian belief propagation in a spatio-temporal hierarchical model, called
Hierarchical Temporal Memory (HTM), can lead to a mathematical model for cortical circuits. An HTM node is abstracted using a coincidence detector and a mixture of Markov chains.”
HTM is the model being used by Numenta and others to crack to machine understanding problem by making an abstract, computable model of the human cortex (brain). I figured the article would explain “Bayesian belief propagation” which parses into three words I understand. I knew I had run into the term “Markov chains” before, in probability theory and elsewhere. I thought I would just refresh my memory about good old Andrey Markov with Wikipedia. But the Markov chains article at Wikipedia was tough going, jumping into the math without clearly explaining it in English. The definition there was comprehensible: “a random process where all information about the future is contained in the present state.” That implies there is a set of rules, expressed as probabilities, so that if you know the current state of a system you can predict its future state probabilities. It neatly combines determinism with indeterminism. For a first impression, I found the Examples of Markov Chains article more helpful.
I thought I might find a better explanation on my book shelf. There was nothing in my Schaum’s Statistics, but my next guess (thank you, cortex) Schaum's Outline of Operations Research
by Richard Bronson and Govindasami Naadimuthu, has an entire chapter, “Finite Markov Chains,” starting on page 369.
Markov chains are a subset of Markov processes. Markov processes consist of a set of objects with a set of states for the objects. At each point in time each object must be in one of the states. As time progresses, “the probability that an object moves from one state to another state in one time period depends only on the two states,” the current state and the probabilities for the future states given the current state.
For example, in the board game Monopoly each “square” or piece of real estate is part of the set of objects. The two dice create a probability structure, the standard one for two dice. The probability that, at any point, a player will transition to the same point, the forward adjacent point, or any point beyond the 12th point forward is zero, because the dice can only roll numbers between 2 and 12. However, more typically there would be a set of states and a probability for the transition to each of the states, including the transition of remaining in the current state.
If there are N possible states, and you know all the probabilities for transitions, you can show this and manipulate it mathematically with a matrix P, the stochastic or transition matrix. There would be N rows and columns in P. As in all probability theory, in any row the probabilities add up to 1.
N=2 is the simplest meaningful number of states. Let’s call them black and white. It is really only interesting if the transitions are different. Say 90% of the time black transitions to white, but only 20% of the time white transitions to black. Then the matrix P would be
There is some ability to predict the future in a statistical way if you have a Markov chain like this. Without doing any math, you would expect over time that if you have only one object, it would probably be white. If it is black, probably the next state of the sequence flips to white, but if it is white, probably the next state will be white. If you have a large number of objects, you can know the likelihood that a given percentage of them will be white or black. Yet you can’t know for sure. Even the least likely state, all objects black, can occur at any time if each black stays black but each white transitions to black.
You can see that Markov chains might be used to represent the states of neurons or of collections of neurons. In my simple examples above, each object influences only its own future state. But in real systems often future state probabilities are determined by the states of a collection of objects. It could be a collection of transistors, or a collection of nerve cells, or HTM nodes.