Tuesday, April 6, 2010

Songbirds, Genes, and Neurons

I just read a New York Times article, From a Songbird, New Insights Into the Brain by Nicholas Wade, and it reminded me of a number of machine understanding issues. So I am going to take a break from my series on the Hierarchical Temporal Memory (HTM) model and muse on intelligence, understanding and song.

The article gives a minimum of information on how genes actually affect the ability of a bird to learn and sing a song. The key revelations of the article are that the zebra finch (Taeniopygia guttata), has had its genome decoded and that about 800 genes change their activity levels in neurons when the finch sings. The article implies that defects in these genes might interrupt singing ability, just as mutated FOXP2 genes in humans cause speech defects. In particular the bird version of FOXP2, if defective, prevents songbirds from singing.

This would seem to go against my basic understanding of how systems of neurons work, which I like to think I is up with the current scientific consensus. Once a basically functioning neural network is in place, I thought genetic activity becomes background activity. Of course the genes would function just like they do in any cell, releasing instructions for making proteins that regulate cell activity. And maybe some of the 800 genes mentioned in Wade's article are ones that would up-regulate or down-regulate any neural activity, not just songs, or learning. But according to David F. Clayton, "these transcripts don't result in the cells producing proteins in the usual way. Instead they seem to modulate the activity of other genes involved in listening."

My (learned from textbooks) model is: genes have blueprints for several types of neurons with varying synapses and neurotransmitters and receptor. Signals are conducted by reasonably well understood mechanisms involving membrane potentials along the neurons and either chemical or electrical transmission at synapses. Genes in the neuron are just caretakers once a system is set up. Learning results from a strengthening or weakening of synaptic thresholds. This is called Hebbian learning, and while there are some theories about how Hebbian learning works at the molecular level, at this point I don't take them as proven.

If the article is true as presented, then individual neurons are more complex than I thought. It is implied that many neurons can function just fine with a mutated FOXP2 genes (every gene would be in every neuron, in fact in every cell), but not neurons that are involved in learning songs. But other neurons learn just fine.

What would distinguish a song-learning neuron from a muscle-coordination learning neuron? I don't know.

As is typical with the New York Times, they want to keep you in their ad ghetto, so they provide no link to the research report, but they say it is in the current issue of Nature. Here is the link: The genome of a songbird

Monday, April 5, 2010

Bitworm HTM Example Program, Part 3: Spatial and Temporal Pool Overview

In Understanding Bitworm Part 2 I wrote: "Among the other parameters of CreateNode you can see spatialPoolerAlgorithm and temporalPoolerAlgorithm. I don't think I having used "pooling" yet. Remember I wrote about quantization points? [See How do HTMs Learn?] There are a number of available points both for spatial and temporal patterns in the unsupervised nodes. They need to be populated, and they may change during the learning phase. Pooling appears to be NuSpeak for this process; a pooler algorithm is the code that matches up incoming data to quantization points."

To learn about the pooling algorithms I went to the Numenta Node Algorithm Guide, which is not at the Numenta web site, but installs with NuPIC under \Program Files\Numenta\nupic-1.7.1\share\doc\NodeAlgorithmsGuide.pdf.

There are two node types implementa the NuPIC learning algorithms:


Some confusion might exist because in more general Numenta discusions a node is treated as a single entity, but both the spatial and the temporal node are needed to create a functioning general node. When the unsupervised node in Bitworm is created with CreateNode(Zeta1Node,...), in effect both a SpatialPoolerNode and a TemporalPoolerNode are created to get full functionality. They refer to both node types being in the same level of the HTM hierarchy. But with you can design more complicated patterns by arranging SpatialPoolerNode and TemporalPoolerNode in an HTM as needed, rather than always pairing them on a level.

"Spatial pooling can be thought of as a quantization process that maps a potentially infinite number of input patters to a finite number of quantization centers." Which in other lit Numenta calls quantization points. Data, in our HTM world, has a spatial aspect. This might not be change along a spatial dimension; space has a more general sense. For instance, the space might be a range of voltages, or sets of voltages from an EKG, for instance. Spatial data usually varies so complexly that we are only interested in the data that is created by objects, or causes. Spatial pooling groups the data into a limited number of causes (or guesses about causes).

Temporal pooling does the same thing with the patterns (objects) identified by the spatial pooler over time sequences. "If pattern A is frequently followed by pattern B, the temporal pooler can assign them to the same group."

A group of nodes forming an HTM level may be able to form invariant representations of objects by combining spatial and temporal pooling. If it can, it passes these representation up the hierarchy.

Once learning is achieved the nodes can be used for inference: they can identify new data as containing patterns that have already been learned.

For now I will focus on the learning phase, since the inference phase is relatively easy to understand if you understand how learning takes place.


I just realized the paper I am reading does not actually give the algorithms used. However, the key algorithm is probably related to the maxDistance parameter. Distance here could be ordinary distance, but it is more likely to be distance within a generalized, possible many-dimensional, heterogeneous pattern space. All kinds of problems leap to mind for writing such a generalized algorithm. I would bet that space/data specific algorithms would really help here (sound vs. sight vs. spatial orientation of human fingers), but perhaps if the quantification is always done before the data is fed in, it is just a matter of matching numbers. Anyway, if you have a distance function, you can group the spatial patterns as falling around a set of centers. These centers are your quantization points. As discussed elsewhere these points are flexible; if a lot of patterns fall close to each other, you might want to tighten up the distance parameter because otherwise you don't use all your allocation of quanization points. That should happen automatically, but either it doesn't, so you need to set the maxDistance parameter, or it does but you still have the option of disagreeing with the automatic or default settings.

Your number of quantization points is set by maxCoincidenceCount. "Storing too few coincidence patterns can result in loss of accuracy due to loss of information. Storing too many coincidence patterns can result in lower generalization and longer training times."

You can also set the sigma parameter. Here's another insight into the algorithm: "each input pattern is compared to the stroed patterns assuming that the stored patterns are centers of radial basis functions with Gaussian tuning. The sigma parameter specifies the standard deviation of the Gaussian [distribution]." So this would work, along with maxDistance, in matching incoming data patterns to existing quantization points.

The clonedNodes parameter allows a set of spatial nodes to use the same coincidence patterns. This allows all the nodes in a level to detect the same causes. In vision that could be moving lines, spots, etc.

The spatial pooler nodes take inputs with the bottomUpIn parameter. The spatial pattern outputs in inference mode are in bottomUpOut; outputs in learning mode go to a temporal pooler.


Temporal pooling has more options than spatial pooling, in particular offering parameters for both first-order and higher-order learning.

Your number of temporal groups, or time quantization points, is set by requentedGroupCount.

You can select a variety of algothims to use to compute output probabilities with the temporalPoolerAlgorithm parameter, but it has no impact on the learning algorithm.

There are a number of sequencer parameters that allow control of the of the algorithm. sequencerWindowCount allows for multiple stages of discovery (the default is 10). sequencerWindowLength allows segmentation of the input sequence to look for patterns. sequencerModelComplexity apparently allows you to adjust for how the recognizable patterns are balanced between the spatial and temporal dimensions. Some objects produce mainly spatial patterns, others mainly temporal, and most combine the two to a greater degree.

As with SpatialPoolerNode, you can clone the nodes if you desire. bottomUpIn takes the data in from one or more spatial pooler nodes. bottomUpOut is the resulting vector of real numbers representing "the likelihood that the input belongs to each of the temporal groups of this node."

In addition to parameters, TemporalPoolerNode takes a command: predict, but it works only in inference mode.


Despite not revealing the details of the algorithms, the Guide, plus the previous materials I read, gave me a good overview of what the algorithms need to achieve. I am pretty sure that I would write algorithms that do approximately what the Numenta pooling algorithms do, but since they have been playing with this for years, I would rather catch up by examinging the code inside the Numenta classes.

See also: More on Internal Operations of Nodes