Machine Understanding

ApeWorm Brain Overview

2012-08-20T14:04:00.001-07:00

(part of the series Aping As the Basis of Intelligence)

Figure 2 gives an overview of the ApeWorm "central nervous system" or control system.

Two eyes or retinas gather information about the exemplar that is analyzed to form an exemplar inner model. The ApeWorm also analyzes it internal sensors to construct a self inner model, or self image. If they match, the Ape control function registers success and indicates to the muscle control functions to stop movement. It also may reinforce the "synapses" or weights that led to the success, including any temporal sequence involved. If there is No Match, muscle control allows movement to continue.

It should be noted that the compartmentalization of functions in Figure 2 need not necessarily correspond to how the system is constructed. A neural model might make it possible, or even necessary, to tightly integrate two or more, or even all, of the functions that seem separate in this analysis.

Constructing the ApeWorm World

2012-07-11T10:50:00.000-07:00

In this draft I will try to avoid using any particular programming language to illustrate the mechanisms of action. If necessary, I will use pseudocode. When the analysis is finished and it is time to try to run simulations I will try to remember to insert links to the code samples.

The ApeWorm world (or dual worlds) does not need to be defined explicitly. The upper and lower limits for the ApeWorm coordinates will suffice. As shown in Figure 1, the allowable coordinates run from 0 to 4 on both axes.

Figure 1

ApeWorms themselves have 4 segments, but require five coordinate points to describe fully. Each point has an x1 component and an x2 component. The points will be called A0(x1,x2), A1(x1,x2) , A2(x1,x2) , A3(x1,x2) , and A4(x1,x2).

The data we gods will use to track ApeWorms are not the same data that they keep track of themselves. An ApeWorm will know where its head, or segment one, is on the world grid. In other words, it will know A0(x1,x2) and A1(x1,x2). Each intersegment node (joint) will be able to convey to the control system one of three states: left, center, or right (L, C, R). Handedness will be determined from the higher numbered segment. In other words for the 1 joint, we look towards the 1 segment from the 2 segment. Thus in the curled ApeWorm example in Figure 1 the joints are all in the right or R configuration.

The joints are controlled by sets of two opposing virtual muscles. When there is no signal to a muscle, it is relaxed, and it contracts as signals increase. An algorithm will determine which of the three states the joint is in based on the relative strengths of the control signals to the muscles.

The ApeWorm's ability to "view" another ApeWorm is stereoscopic but otherwise simple. The virtual retinas coincide with the x1 and x2 axes. Each segment of these axes can only "see" a segment in its row or column, and if there are two segments in a column can only see the closest one. [But a second segment in a column might be inferred by the control system.] The sensor can see the segment number, if any, and can see the distance, or what cross-coordinate the segment lies on. Thus in the upper-left example in Figure 1, the sensors on the x1 (horizontal) axis will record nothing. The x2 sensor closest to the origin will record a segment 4 that is at x1 = 1. The x2 sensor between 1 and 2 will record a segment 3 also at x1 = 2, etc.

Next: ApeWorm Brain Overview (to be constructed)

ApeWorm, an Initial Machine Ape Project

2012-07-10T17:10:00.002-07:00

The core of the system we seek is quite complex. We would like to minimize the inputs and outputs in order to focus our first project on the core aping system. At the same time, we want a general solution that is scalable to projects with larger, more complex input/output and memory requirements.

We will call our first attempt ApeWorms. These creature constructs will exist on a 5 x 5 grid and will have, for bodies, four connected line segments, the ends of which line up on the grid. Any two adjacent line segments will be able to have three relative positions: left, straight, and right {L, S, R}. An ApeWorm cannot have two segments overlapping each other. Figure 1 shows a few possible ApeWorm configurations.

Figure 1

While ApeWorms are physically simple, their neural networks, or control software, can be as complex as we need. What should an ApeWorm know about itself, and what should it be allowed to know about an exemplar ApeWorm, if aping it is to be a non-trivial task requiring an extensible aping machine?

The ApeWorm will have one end designated the zero {0} end, with joints {1, 2, 3} and a {4} end. Each segment will be designated by its terminal joint, {1, 2, 3, 4}. Each joint will signal its relative position, going from the lower to the higher numbers, as L, S, or R (after training). The zero end and 1 joint will be able to sense their positions in the grid, again after training. For illustration purposes joints may be color designated: 1. Black 2. Green 3. Blue 4. Red. [Training adds a degree of complexity, which I suspect will be important as we move to systems more complex than ApeWorm]

How would the ApeWorm know about its exemplar? Note that it would be a trivial task, if exact information about the exemplar is known, to assign that information to the ApeWorm directly in software. But if we built two ApeWorms in the flesh, so to speak, we should have autonomous systems that require such information to cross space, so that information about position becomes encoded in some other form.

We will allow the ApeWorm to have two "eyes" on the Exemplar worm. The eyes (retinas) will be at the bottom and left sides of the grid. Each eye can detect segments that are broadside to it, and their orientation by row or column and by distance from the eye.

It is tempting to ignore the temporal aspect in the first build of ApeWorm. It seems difficult enough to construct a general aping ability when the minimum requirement is to detect the position of the Exemplar worm and then provide for the ApeWorm to conform itself to a copy of that. However, going forward we will want temporal input and temporal aping.

Consider that the aping instinct evolved in animals only after basic lower functions were in place. An animal has to sense the environment, coordinate itself, catch food and escape predators just to survive. All that puts a high premium on temporal capabilities. Aping sits atop all those capabilities, just as language and culture sit atop aping. We have freed our ApeWorms from the need to eat, but we run the danger of making our aping system hard to integrate into robots and more complex programs if it is unable to at least deal with time sequences.

Next: Constructing the ApeWorm World

Specifications for the Language Machine

2012-06-11T18:34:00.000-07:00

Aping as the Basis of Intelligence (cont.)

Specifications for the Language Machine

Typically in systems of artificial intelligence designed for language there is a front-end feature detection system. Thus the slight fluctuations in air pressure we call sound are analyzed for features. In the case of human language these features are often quite complex, but at this point they are well-studied. Thus detectors have been devised for common syllables and voice ranges.

In a developing human there are likely some very generalized feature detectors, but they are also very flexible. This would also be true in mammals and birds that have shown they can learn some human words. Thus a human baby can learn a primitive, click-based tongue from Africa, the simple syllables of modern English, or a tone-based Asian language system. In effect feature detectors evolve based on exposure to language.

Voicing is also complex, controlled by a wide range of muscles. It too is learned, and requires considerable practice to achieve perfection. Aping the voices of other humans is the primary method of learning to speak so as to be understood.

Four major input/output streams can be defined for a human-like language machine. There is the audio input from the ears. There is output to a variety of muscles that produce sounds and speech. There are other inputs ultimately external to the body, necessary to provide positive and negative behavior reinforcement, such as touch. There are internal desire (or rejection) type inputs, notably hunger and other discomforts or wants. There is also a need for decision making: given all the other inputs, deciding what sounds to make and when. This decision making could be incorporated into the language machine or it could be external, and probably is some combination of both in humans.

Aping as the Basis of Intelligence

2012-06-05T19:19:00.000-07:00

Logic was thought to be the key paradigm for human intelligence from the Aristotle to Wittgenstein, and then in the early days of Artificial Intelligence as implemented on computers.

Careful consideration shows that aping, the ability to copy an action of another individual, is more central to intelligence. It is as much a mammal trait as a human one. Formal exercises in logic are not needed for one animal to learn behavior from another. The complex neural circuitry created by evolutionary pressures to enhance aping potential eventually became the basis of human language and its formalized reasoning skills known as logic.

Aping consists of observing (usually visually) what some other individual is doing, and then copying that behavior. Humans do this so naturally that we seldom think about it consciously. Unless trained to do so (by aping others), practicing logic is a much harder skill. Yet logic circuits are easy to implement with electronic parts. Aping another individual requires very complex systems within the brain.

This essay, and the accompanying computer programs and physical systems, is an attempt to create machines capable of aping, which then should be able to handle other complex behaviors usually associated with human intelligence, including language, mechanical skills, reasoning, and possibly consciousness.

The advantage of aping skills goes beyond helping to ensure the survival of an individual animal. With aping a behavior with survival benefits can become cultural, spanning an unlimited number of generations of animals. This without any genetic change specific to the behavior itself. A generalized pool of neurons can become capable of generating a wide and flexible variety of behaviors, giving a species a heavy advantage over animals that can only engage in behavior based on pre-programmed neural systems.

Pre-language example. Consider a simple aping process and how it differs from paradigms like ordinary remembered (learned) behavior. An animal, a young wolf, has always simply crossed a cold stream by simply plunging through it. One day it sees an older wolf stop at the stream and proceed along the bank to a point where there is a rock in the center of the stream. The older wolf jumps to the rock, then to the far bank, thereby avoiding getting wet while crossing. The young wolf then apes that behavior. Afterwards, when crossing the stream in that area (when not in hot pursuit of prey) the young wolf diverts to the special crossing point.

Note how the aping behavior differs from what might be other reasons for using the rock to cross. The young wolf could learn of the rock crossing by exploration, then would remember the location of the crossing when appropriate. You could argue that following an older animal might be an instinct, so this is not a real example of aping, just a side effect of following.

Aping behavior might have originated in following behavior, and the basic ability to learn from experience, but as more complex situations are involved, it becomes clear that aping is a special case requiring special capabilities.

Supposing this simple example constitutes a form of aping, what can we say about how it is accomplished by the wolf brain? The wolf is capable of recognizing an externalized self. The example wolf, to the aping wolf, represents a possibility for itself. Metaphorically, the aping wolf can see itself crossing the stream using the rock when it sees the example wolf. So the wolf-brain is capable of contemplating (in a very simple, non-philosophic way) the future. It can choose to ape the example, or it can refuse to cross the stream, or it can ignore the example and get wet and cold crossing the stream.

Aping, in the infants of actual apes including humans, is probably automatic. Only at a later time will the young apes start choosing whether or not to ape particular behaviors. There is typically outside support for aping in human children. We reward them when we like their aping efforts, but may discourage them when we don't like their efforts (as when they engage in behaviors culturally reserved for adults).

Consider the "simple" modern act of making a piece of toast with a bread toasting appliance. A child is not likely to learn to do this by accidentally taking a slice of bread, sticking it into the appliance, and pushing down the lever. A child learns this by watching a example or exemplar. The child understands that doing what an exemplar does leads to the same result. The child sees that taking a piece of bread, placing it in a certain way in the appliance, and pressing the lever results in the making of toast. Usually, if sufficiently coordinated, a child can do this on the first try. Aping often results in some level of success in achieving a goal on the first try.

To ape an exemplar the aping child must already have similar levels of muscle control and sense of where its body is in relation to its surroundings. It must understand that aping the exemplar's movements (or other behaviors like vocalizing) achieves a desired result. If an aping attempt is made that does not achieve the desired results, a child may conclude that something is missing from the aping process, a subtlety or trick.
To ape, the brain must have a number of capabilities, which might be summed up as the ability to coordinate the body of the subject with the body of the exemplar.

In humans, when children are quick to ape adults, we call them intelligent. If they are slow to imitate adult behaviors, we call the slow or less intelligent.

Language learning example. Human babies start learning to imitate vocalizations soon after birth. Vocal fussing often results in rewards, like feeding, encouraging further vocalizations. The process may seem like a long one when mechanical devices have long existed that can record and play back sounds and language. There is a fundamental difference between the mechanical play back model and the aping vocalization model that is critical to the reconstruction of how humans are capable of intelligent behavior, including understanding how the world works. [I prefer Machine Understanding as a term for what we are working towards with computerized robots capable of aping, because Artificial Intelligence has largely ignored a set of crucial problems that logic circuits have difficulty handling].

The human brain does not appear to be a recording medium in the sense that a phonograph, magnetic tape machine, or silicon-memory based sound recording and playing device is. The human ability to immediately repeat a phrase of language or hum a melody does not arise from a like mechanism. Humans have fairly good memories, but a weak ability to memorize.

Consider the problem, for the human brain, of learning to repeat its first few spoken words, say "mama, papa, no, puppy, bye, milk." In the ear there are a number of sensory cells that respond to different frequencies of sound. They send nerve impulses towards the brain. At the other end we have the sound making machinery: lungs, vocal cords, and mouth. An infant screaming soon after birth indicates that machinery works, but is nothing like language. Months of hearing and experimental vocalizations follow. There may be feedback, positive or negative, by parents and other exemplars, as well as careful repetition of simple words in the hope this will induce aping by the child.

In the end, however, success is the ability of the child to say a word that sounds, to its own ear, like the kind of words spoken by the exemplars. It will not be identical, nor will it ever be identical, in its fine structure, in fact it will be identifiably different, an individual voice. Six words mastered, many more will follow, all aped after exemplars. In addition, the words will have meaning: they will coordinate with objects or actions in the external world (or internal world, as with "more" or "hurts").

The neural machine that coordinates heard words and the speaking of words with an "ego" must be very complex and very capable. Recreate that machine and we would have a model that, perhaps with some specialized variations, would be able to ape visual input to body positions, or anything else required of it.

Predictive Memory and Particle Filters

2012-01-22T12:57:00.000-08:00

I keep thinking about the Particle Filters AI method, so might as well right some of it down. Note that in AI Particle Filters has a particular meaning, different from the physics for particles in a real-world simulation or in a video game. In AI filtering is a method for localization of an object, which could be a large physical object like a robot. It involves probabilistic methods (Monte Carlo techniques) for predicting where the object might be, combined with sensory feedback. Typically it requires a pre-existing map of the territory (which for a robot is real, but which could be highly abstract in other cases) and a method for matching the feedback to the map. It has been most notably used as an essential software component of the Google self-driving car. A somewhat more extensive overview can be found at Particle Filters.

The reason I keep thinking about it is because I started noticing how I consciously (and probably subconsciously) localize myself in the physical world. Most notably, when I am navigating in the dark in my house. I do this to avoid backtracking: turning on a light, going back to the prior switch & turning it off, etc. In any case I know my house well, but in just a few feet in the dark it is easy to get a bit off course and bump into furniture or a wall where I expected a doorway. The stairs can be tricky too.

So I walk a few feet in what I guess is the right direction based on experience. Then I reach out to touch what I expect is a wall or piece of furniture. Or I note a change from rug to bare floor, or spot an LED. Usually I get the feedback I expect, but sometimes I find I am off course and need to make a correction.

In Particle Filter Localization terms, I project a probabilistic range of particles representing where I, the human robot, might be next. Then I check. Often I know exactly where I am after a check, but sometimes all I know is that I am not near a wall or furniture piece. Then I make another mental projection, take a step or two depending on my expectations and level of caution, and try again to touch something. I go through similar mental gymnastics when driving a car or taking a daylight walk, but the feedback is visual rather than touch.

I went through a similar analysis process after reading Jeff Hawkins theory of predictive memory in On Intelligence. The theory is that the brain largely operates by making constant predictions, based on memory of historical patterns, and checking them against sensory input. When input varies from predictions, neurons send special signals about detecting novelty to higher brain centers that deal with novelty. For instance, as you read this your brain knows a number of words might be included in this zebra. See, you expected "sentence" or maybe "essay" or the like, you were not expecting "zebra", and so the normal flow of processing was interrupted.

I have not seen a neural network model for particle filters. It could be that the resemblance between how human brains seem to work is only superficially like particle filters. More generally we are talking about feedback systems, which can take a number of forms. Even very primitive animals that lack differentiated neurons have feedback systems, so there could be many types of neural feedback schemes operating in the human brain.

I often notice my dog doing something that makes me think that most mammals, including pre-sapiens homo species, can do that. Yet when these processes, say chasing a rabbit, are abstracted, they can become quite complex. I can imagine chasing a rabbit. I can plan to chase a rabbit into a trap. I can pursue something more abstract, say a solution to a construction problem or even a math problem in a way that resembles chasing a rabbit. Let's see, the answer to the solution of this differential equation has disappeared, it is not obvious. I'll beat the bushes, using three techniques known to help solve this type of equation, most-likely technique first. Hopefully one technique will put the problem in a form that I recognize from the standard map for solving such equations.

How would particle filters be implemented with neural networks? Let me know your thoughts.

Stanford AI Class wrap up

2011-12-26T17:12:00.000-08:00

I managed to muddle my way through the free Internet version of the Stanford Introduction to Artificial Intelligence (AI) course. "Congratulations! You have successfully completed the Advanced Track of Introduction to Artificial Intelligence ..." says my Statement of Accomplishment.

Before putting on my analyst mask, I would like to thank Stanford University and particularly the instructors, Sebastian Thrun and Peter Norvig, for conducting the course. In particular, for making it free of charge. I am hoping they will leave the instruction videos up for a while, there are some I would like to go over again.

I got a good review of Bayes Rule, basic probability, and some simple machine learning algorithms. I had not worked on Planning algorithms before, so that was of some interest. Markov Models had always been a bit vague to me, so that section helped me nail down the idea. Games and game theory seem to have made little progress since the 1940's, but I guess they have to be covered, and I did get clear on how MinMax works. Computer vision seemed kind of weak, but then you can't assume students know basic optics and at least we learned how to recognize simple features. Robotics was a prior interest for me, and I did not know about Thrun's obvious favorite, Particle Filters, which are a useful paradigm for spatial positioning (aka localization).

The last two units were on Natural Language Processing, and that is a good place to start a critique (keeping in mind that all this material was introductory). Apparently you can do a lot of tricks processing language, both in the form of sounds/speech and written text, without the algorithms understanding anything. They showed a way to do pretty decent inter-ethnic language translations, but the usefulness depends on humans being able to understand at least one language.

Plenty of humans do plenty of things, including paid work, without understanding what they are doing. I suppose that could be called a form of artificial intelligence. Pay them and feed them and they'll keep up those activities. But when people do things without understanding (I am pretty sure some of my math teachers fell into that category), danger lurks.

The Google Car that drives itself around San Francisco (just like Science Fiction!) just demonstrates that driving a Porsche proves little about your intelligence capabilities. Robot auto-driving was a difficult problem for human engineers to solve. They were able to solve it because they understood a whole lotta stuff. Particle Filters, which involve probability techniques combined with sensory feedback to map and navigate an environment, are a cool part of the solution. If I say "I understand now: I have been walking through a structure, and to get to the kitchen I just turn left at the Picasso reproduction," I may be using the word understand in a way that compares well with what we call the AI capabilities of the Google Car. Still, I don't think the Car meets my criteria for machine understanding. The car might even translate from French to English for its human cargo, but I still classify it as dumb as a brick.

Hurray! Despite my advancing age, lack of a PhD., less than brilliant business model, and tendency to be interested in too many different things to be successful in this age of specialization, no one seems to have gotten to the essence of how humans understand, and are aware of, the world and themselves.

If the human brain, or its neural network subcomponents, did Particle Filters, how would that work? I know from much practice that bumping around in the dark can lead to orientation or disorientation, depending on circumstances. On the other hand the random micro-fine movements of the eye might be a physical way of generating randomness to test micro-hypotheses that we are not normally consciously aware of.

We sometimes say (hear Wittgenstein in my voice) that someone has a shallow understanding of something. "Smart enough to add columns of numbers, not smart enough to do accounting," or "Good at game basics, but unable to make strategic decisions." Let me put it another way: in some ways the course itself was an intelligence test. I imagine it would be very rough for anyone without a background in algebra and basic probability theory. The students in the class already knew a lot, and had to learn difficult things.

I want to know how our bodies, our brains, learn difficult things. The only way I will be able to be sure that I understand how that is done is if I can build a machine that can do the same thing.

stuck

2011-11-16T19:27:00.000-08:00

Maybe we need the equivalent of a quantum hypothesis for the brain, in the sense that it may not be possible to construct a corrrect model based on classic principles, no matter how complex.

Stanford AI Class thoughts, and a brief tour of AI history

2011-10-24T11:51:00.000-07:00

"people who face a difficult question often answer an easier one instead, without realizing it"
— Daniel Kahneman,

Despite time management issues (which will only get worse this week) I managed to struggle through the first two weeks, or four units, of the online Stanford introduction to artificial intelligence course.

In the past I had already tried to struggle through Judea Pearl's Probabilistic Reasoning in Intelligent Systems. That was published well over 20 years ago, and yet this course uses many of the same examples. The course is much more about working out actual examples; it is practical, not so theoretical. We've covered Bayes networks and conditional probability, both concepts I had already learned because Numenta was using them. Pearl's book contains a lot of material about wrong directions to take; the Stanford course seems to be focussed on what actually works, at least for Google.

My impression is still that the Stanford AI paradigm, while very practical, is not going to provide the core methods for truly intelligent machines, which I characterize as machine understanding. I think this largely because I am ancient and have watched quite a few AI paradigms come and go over the decades.

When AI got started, let's say in the 1950's, there was an obsession with logic as the highest form of human intelligence (often equated with reasoning). That computers operated with logic circuits seemed a natural match. Math guys who were good at logic tended to deride other human activities as less difficult and requiring less intelligence. Certain problems, including games with limited event spaces (like checkers), could be solved more rapidly by computers (once a human had written an appropriate program) than by humans. By the sixties, at latest by the seventies, computers running AI programs would be smarter than humans. In retrospect, this was idiotic, but the brightest minds of those times believed it.

One paradigm that showed some utility was expert systems. To create one of these, experts (typical example: a doctor making a diagnosis) were consulted to find out how they made decisions. Then a flow chart was created to allow a computer program to make a decision in a similar manner. As a result a computer might appear intelligent, especially if provided by the then more difficult trick of an audio imitation voice output, but today no one would call such a system intelligent. That is no more intelligent than the early punch card sorters that once served as input and output for computers that ran on vacuum tubes.

In the 1980's there was a big push in artificial neural networks. This actually was a major step towards machines being able to imitate human brain functions. It is not a defunct field. Some practical devices today work with technologies developed in that era. But scaling, the problems grew faster than the solutions. No one could build an artificial human brain out of the old artificial neural networks. We know that if we can exactly model a human brain, down to the molecular (maybe even atomic) level, we should get true artificial intelligence. But simplistic systems of neurons and synapses are not easy to assemble into a funcioning human brain analog.

The Stanford model for AI has been widely applied to real world problems, with considerable success. This probabilistic model allows it to deal with more complex data than the old logic and expert system paradigms ever could. Machine systems really can learn new things about their environment and store and organize that information in a way that allows for practical decision making. Clearly that is one thing human brains can do, and it is a lot more difficult than playing in a set-piece world like tic-tac-toe or even chess.

Sad as the state of human reasoning can be at times, and as slow as we are to learn new lessons, and as proud as we are of our least bouts of creativity, (and as much as we may occasionally ignore the rule against run-on sentences), I think the Stanford model is not, by itself, going to lead to machine understanding. The human brain has a lot of very interesting structures at the gross level and at the synaptic level. Neurologists have not yet deciphered them. Their initial "programming," or hard-wiring is purely the result of human evolution.

When is imitated intelligence real intelligent? When does a machine (or a human, for that matter) understand something, as opposed to merely changing internal memory to reflect the external reality?

Then again, maybe a Stanford model computer/program/input/output system would have done better at the Stanford AI course than I have. I certainly have not been getting all the quizzes and homework problems right on the first try. On the other hand, I think it will be some good long time before a machine can read, say, a book on neurology and carry on an extended intelligent conversation about it.

Brain Modeling Computational Trajectory

2011-09-22T11:21:00.000-07:00

SGI is a manufacturer of high-performance computers, what might be called small supercomputers. I was listening to their analyst day (I own stock in SGI) this morning and saw an interesting slide, which I reproduce here:

With regard to machine understanding, this is the direct assault method. At some point when the human brain is modeled in sufficient detail the construct should display human memory, intelligence, and even consciousness or self-awareness. It is conceivable that a detailed computer model might exhibit artificial intelligence or understanding but leave us still unable to comprehend the essence of what is happening. More likely we will be enlightened and therefore able to construct working machine entities that have true intelligence and understanding, but are not exactly modeled on the human nervous system.

There are different levels for modeling biological systems of neurons and brain matter. We might model on the atomic, molecular, sub-cellular, cellular, or neural-functional level. It is not clear what level of detail is assumed in the SGI projection. Best guess is that is has a model for individual neurons, but the complexity is added by adding additional neurons to the network. That would be the main difference between modeling a "cellular neocortical column," a mesocircuit, a rat brain, and a human brain.

SGI already makes computers for researchers in this field. Of course other vendors' computers can be used. The advantage SGI brings to the table today is the ability to build a large model in computer memory chips (the processors see a big, unified memory), as opposed to hard disks.

I would be very impressed if someone could correctly model a functioning rat brain by 2014. Keep in mind that just because the computer power is available to do it, does not mean that any given team's model is correct. I wonder what proof on concept would consist of? To test such a brain you would need a test environment. That might be a simulation, but it could also be robotic. Keep in mind that much of what a rat brain does relates not to what we think of as awareness of the environment, but to maintaining body functions.

Another issue is the initial state problem. Suppose that you "dissect" a rat so that you know the relative placement of every neuron in its central nervous system. Still, to make your model work, you would need a functional initial state for all the the cells. You need to know how synapses are weighted. Probably someone is working on the boot-up of mammal brains during embryonic development. Even just getting the genetically re-programmed neurotransmitter types for each synapse to each cell seems like a more difficult problem that making a generalized computer model based on a neural map plus a generalized neuron.

Apparently there is plenty of work on this project for everyone.

Stanford AI course

2011-08-16T09:20:00.000-07:00

I signed up for the free Stanford course, An Introduction to Artificial Intelligence to be taught by Sebastian Thrun and Peter Norvig this fall. I figure it can't hurt, and free is a good price.

Naive Bayes Classifier in Python

2011-07-21T08:56:00.000-07:00

I have been busy with many things: indexing a book on software management; trying to learn math things I would have learned when I was 18 or 19 if the Vietnam War had not made me decide to major in Political Science; advising investors about the value of Hansen Medical's robotics technology and Iteris's visual analysis software for cars and highway monitoring.

So even though I continue to think about MU, and to study, I have had nothing in particular worth reporting here. However, I came across a good introductory page on using Bayes probability with the Python programming language. If that is what you need, here it is:

Text Categorization and Classification in Python with Bayes' Theorem

And I would love to play with a Kinect, but where would the time come from?

Handling Complex Numbers

2011-04-18T15:37:00.000-07:00

I have not been logging my MU activity very well. I am trying to develop a simple system to test my ideas. The first one bogged down and went nowhere. At the same time looking at a lot of possibly useful math, including Hilbert spaces.

Today was thinking about neural network handling of complex numbers. Decided to let others do my thinking for me and came upon this paper after a search. Reminds me that I think all science journals should be published on the net with no charge to read; that would accelerate the advances we all desire.

Enhanced Artificial Neural Networks Using Complex Numbers by Howard E. Michel and A. S. S. Awwal. I don't know if I can use their specific model, but it is good food for thought.

gradient covariance

2011-03-01T18:18:00.000-08:00

Thinking, thinking, thinking. Trying to work towards an example.

Spent more time than I meant making sure I understand how gradients act, their covariant nature under coordinate changes. Wrote out a "simple" example, just a 2D transformation of the gradient of a simple function. But it still took awhile. I made an arithmetic, an algebra, and a calculus mistake the first time through; am I ever rusty!

If you are looking for an example of covariance of a tensor or vector, try it:

gradient covariance

Now to make neuron-like structures do the math for me ...

bogged down

2011-01-22T10:30:00.000-08:00

Earlier this week I thought I was on the verge of a breakthrough, but instead I sank into the bog of mathematics, analytic issues, and philosophical delusions.

There is a tendency to think of the identification of an object, say a dog or better still, a specific dog, as coinciding in the brain with the firing of a specific neuron, or perhaps a set of neurons. That might in turn fire a pre-verbal response that one could be conscious of, then the actual verbal response, whether as a thought or as speach: "Hugo," my dog.

Some would make this a paradigm for invariance. Hugo can change in position, wear a sweater, age or even die, but the Hugo object is invariant.

But that, the noun, is the end result. It is not the system that creates invariance. Nor do I think that the system of building up small clues, as described by Jeff Hawkins and implemented to some extent by Numenta, is sufficient to explain intelligence, though it might serve for object identification.

I am even wondering about Hebbian learning, in which transitions in neural systems are achieved by changing weights of neural connections. It is simple to model, but if it isn't what is really going on in the brain (or is only part of what is really going on), assuming it is sufficient could be a block to forward progress.

Maybe I am way off track here. I just read again about how no one could explain all the spectral data accumulated in the 19th century. Then Bohr threw out two assumptions about electrodynamics and added a very simple assumption, that electrons near an atomic nucleus have a minimal energy orbit, and quantum physics finally was off to the races.

On the other hand, sometimes a slow steady program like Numenta's works better than waiting for a breakthrough. I'm giving my neurons the weekend off and going to the German film festival at the Point Arena Theater.

Pointing Choices

2011-01-13T10:19:00.000-08:00

Quote of the day:

"This convention is at variance with that used in many expert systems (e.g. MYCIN), where rules point from evidence to hypothesis (e.g., if symptom, then disease), thus denoting a flow of mental inference. By contrast, the arrows in Bayesian networks point from causes to effects, or from conditions to consequences, thus denoting a flow of constraints attributed to the physical world."

Judea Pearl, Probabilistic Reasoning in Intelligent Systems, p. 151

Eyes Follow Brain Shifting of Attention

2011-01-12T11:51:00.000-08:00

In case you missed it:

Human Brain Predicts Visual Attention Before Eyes Even Move

This is a confirmation that one of the principle jobs of the brain is to make predictions, then use the senses to confirm or deny the predictions.

Tensor Introduction

2011-01-03T14:14:00.000-08:00

Largely as a prelude to my own work, I posted an introduction to tensors at OpenIcon. It may improve as time passes.

Wandering through the multi-dimensional abyss

2010-11-27T09:40:00.000-08:00

I have not posted to this blog for a long time partly because I lost my focus on the Numenta program within the Machine Understanding arena, and partly because I have been distracted by the need to earn money, and even the political silly season. On a good note my friend Dan Hamburg got elected to be Mendocino County Supervisor, and the government of California can now approve budgets with a majority vote. Which does not mean they will start doing budgets on time, or balance them, or run the state well; but they could.

I found, slogging through Judea Pearl's Probabilistic Reasoning in Intelligent Systems
and Terrence Fine's Probability and Probabilistic Reasoning for Electrical Engineering
, that my mind kept fuzzing up. Maybe I am slowing down in my old age, but the real problem was that my last formal training in probability was when I was 19, and interpreting clinical trial p values is guestimate work. I had to regress to a simpler text than what I used in college, and I can recommend Finite Mathematics with Applications
by A. W. Goodman and J.S. Ratti for introductions to simple probability, conditional probability, Bayes' Theorem, and even Markov Chains that were simple enough for me to feel I really understood easy examples and the concepts themselves.

But my wanderings have been further afield than that. I continue to be fascinated with tensors, and got a lot out of Introduction to Tensor Calculus, Relativity and Cosmology
by D. F. Lawdon. Again, I never got to tensors in college (I ended up a Political Science major), and thinking I was brighter than I really was (brightness is mainly a function of prepartion, I now know), started off with mathematical treatments that were too abstract for me to do more than pretend to follow.

I have even got stuck on Maxwell's equations for electromagnetism. Now we all should admit that if we read broadly in math and science we don't take the time to really understand everything; we trust our fellows to have done their homework before a set of facts or an equation is presented in a paper or textbook. We may like to feel we agree with quantum physics, but who except for professional physicists have the time to really look at the data and the math in detail? I have always assumed that Maxwell's equations are correct, and that if I needed to I could look up the definitions of curl, etc., and do the math. But that is not the same thing as the deep understanding one gets from working in electromagnetics on a regular basis.

I have wandered farther afield than that, to Lie groups and Galois theory, which may have nothing to do with machine understanding. Nevertheless, I wander. And I keep coming back to what is known about the structure of the cortex, of the actual tangles of nerve cells themselves, and in particular to the way pyramidal cells span multiple layers of the cortex with their intricate axons and dendrites. How do you create a math that represents such a tangle? Skipping that, you can do funtional units as Numenta does, or you can try the AI tradition with its tradition trying to get the end results without understanding the details of how neurons actually get stuff done.

Right now I have little paid work going on, so I may be writing in the blog more often. If paid work becomes available, there will be more delays.

Seeing Predictive Memory Everywhere

2010-08-26T13:21:00.000-07:00

Numenta cancelled its planned October class. I stopped working my way through their examples. I created the index for Microsoft Excel 2010 Inside Out. My stepson got married.

Yet all the while I've been watching how my mind works in light of the predictive memory theory. What good would memories be if they did not allow animals to make predictions that help with survival? I have watched my mind make mistakes, in reading for example: wait, that doesn't make sense, I read "farming" for framing. I watch my dog Hugo make decisions (mostly to not obey me). I watch other people make decisions.

I also continue to ponder how the system works. There are computer models like Numenta's, and biological models. When I study math part of me is assessing its utility for modeling machine understanding. Even reading the Excel book, which I really liked, got me to thinking about how advance Excel tools might be used to model neurons or probabilistic reasoning.

But I can't say I have any breakthroughs to report. I can't even say I am going to be writing this blog on a regular basis. There are fires that need to be put out, and fires that need to be lit.

I started on a simple demo program, I mean really simple demo program, just to get going on flowing data through nodes. I started it in Visual Basic, with the intent of also doing it in Python and at least one other language, maybe C++. I might try to restart the MU project their, or I might go back to working through the Numenta examples. No promises. But if I do manage to get anything done, I'll post it.

Understanding Probability and Probabilistic Reasoning

2010-06-10T11:55:00.000-07:00

I think I mentioned earlier that I have had problems with the Numenta model as described in "Towards a Mathematical Theory of Cortical Micro-circuits" because of the use of probability-based mathematics. It seems to me that neurons are deterministic mechanisms. But I have noted in the past that I can be pretty dim-witted at times, and decided to study Numenta's HTM systems anyway. I reminded myself that quantum physics has two different formulations, one based on matrix algebra and the other on the Schrodinger equation. They both work, and some brilliant person showed that they are formally the same long, long ago. So when thinking about or solving problems you can use whichever is easiest or gives the best insights. The same way some physics and math problems are easier in polar coordinates than in rectangular coordinates.

Months ago I ground to a halt in my reading of Judea Pearl's Probabilistic Reasoning in Intelligent Systems, which provides much of the background to the Numenta discussion. Yesterday I decided to tackle it again and commenced reading at page 143. I noticed that some notation was ambiguous, which is typical of expert writers who assume their readers are right up with them. So I decided to go back and make sure that P(A,B) really does mean the probability that both A and B are true. I thought I'd make sure I understood the Bayes interpretation of probability as well.

I ended up reading starting at page 29, Chapter 2, Bayesian Inference, 2.1, Basic Concepts, 2.1.1 Probabilistic Formulation and Bayesian Inversion. Note that I took two semesters of logic and one semester of probability in college, and as part of my profession deal with biostatistics, the kind reported from clinical trials, on a regular basis. Note also that I have studied philosphic issues of quantum physics and even the math involved.

Yet when I read this simple introduction this time, the scales fell from my eyes, or from my cortical networks.

With probabilistic reasoning, it is fair to say that we are not talking about rolling dice (even though Pearl uses the familar probabilities of two-die rolls to illustrate some points).

We are talking about the math of pobability theory. For most practical purposes, that is the math of fractions. Third or fourth grade stuff. (I had a fifth grade teacher I hated, Mrs. Lopez, who was all about memorizing things. We memorized the decimal equivalents of about 50 common fractions. I knew I could always get the decimal equivalent by dividing, so considered this a stupid exercise.)

When thinking about human memory, you can safely substitute "percentage of like situations" for probability.

Updating the "percentage of like situations" based on experience makes sense. Since we can test for novel situations, like "both A and B" or "A and not C, given B", by multiplying, adding, or subtracting fractions, these updates may effect a chain of knowledge or deductions across the brain (or mind, if you prefer).

Calling all background information and assumptions a person has K (I don't know why K, maybe it stands for Knowledge), I quote Pearl page 30: "However, when the background information undergoes changes, we need to identify specifically the assumptions that account for our beliefs and articulate explicitly K or some of its elements."

Many Philosophers, notably Ludwig Wittgenstein, have shown how reasoning goes awry when we use one word to mean multiple things, or one thing that is vague or complex. We think we are being clear using logic symbols or math equations or tech speak. But when something is amiss, it may not be a problem with our reasoning. It may be that we need to update our background assumptions.

See also Bayes' theorem

New Algorithms from Numenta

2010-05-10T10:33:00.000-07:00

My study of Machine Understanding was on pause for a couple of weeks while I compiled an index for a 802.11n networking book. On May 5 I received a Numenta Newsletter, the key point of which is that Jeff Hawkins and crew have been working on a better algorithm for their HTM systems. Sadly, I still have not gone into the details of the old algorithm!

I'll quote the key passage from Jeff:

"Last fall I took a fresh look at the problems we faced. I started by
returning to biology and asking what the anatomy of the neocortex
suggests about how the brain solves these problems. Over the course
of three weeks we sketched out a new set of node learning algorithms
that are much more biologically grounded than our previous algorithms
and have the promise of dramatically improving the robustness and
performance of our HTM networks. We have been implementing these new
algorithms for the past six months and they continue to look good."

Sure. Even my own limited reading so mostly-outdated neurology texts seemed to indicate that the early versions of HTM are simplistic (compared to systems of human brain neurons). The new version, styled FDR (Fixed-sparsity Distributed Representation), are somewhat more complicated, but Jeff believes they are more capable. In particular, they deal better with noise and variable-length sequences.

On the other hand, we are certainly hoping to get machines to actually understand the world without having to duplicate (in software) a human brain molecule by molecule.

Jeff gave a lecture on the new algorithm at the University of British Columbia, which will have to do for the rest of us until details are posted at the Numenta web site:

http://www.youtube.com/watch?v=TDzr0_fbnVk

See also my Machine Understanding main web page.

In the meantime I intend, in addition to doing my own thinking & tinkering, to resume my program of going through the already-posted, earlier version examples of HTM.

Songbirds, Genes, and Neurons

2010-04-06T09:53:00.000-07:00

I just read a New York Times article, From a Songbird, New Insights Into the Brain by Nicholas Wade, and it reminded me of a number of machine understanding issues. So I am going to take a break from my series on the Hierarchical Temporal Memory (HTM) model and muse on intelligence, understanding and song.

The article gives a minimum of information on how genes actually affect the ability of a bird to learn and sing a song. The key revelations of the article are that the zebra finch (Taeniopygia guttata), has had its genome decoded and that about 800 genes change their activity levels in neurons when the finch sings. The article implies that defects in these genes might interrupt singing ability, just as mutated FOXP2 genes in humans cause speech defects. In particular the bird version of FOXP2, if defective, prevents songbirds from singing.

This would seem to go against my basic understanding of how systems of neurons work, which I like to think I is up with the current scientific consensus. Once a basically functioning neural network is in place, I thought genetic activity becomes background activity. Of course the genes would function just like they do in any cell, releasing instructions for making proteins that regulate cell activity. And maybe some of the 800 genes mentioned in Wade's article are ones that would up-regulate or down-regulate any neural activity, not just songs, or learning. But according to David F. Clayton, "these transcripts don't result in the cells producing proteins in the usual way. Instead they seem to modulate the activity of other genes involved in listening."

My (learned from textbooks) model is: genes have blueprints for several types of neurons with varying synapses and neurotransmitters and receptor. Signals are conducted by reasonably well understood mechanisms involving membrane potentials along the neurons and either chemical or electrical transmission at synapses. Genes in the neuron are just caretakers once a system is set up. Learning results from a strengthening or weakening of synaptic thresholds. This is called Hebbian learning, and while there are some theories about how Hebbian learning works at the molecular level, at this point I don't take them as proven.

If the article is true as presented, then individual neurons are more complex than I thought. It is implied that many neurons can function just fine with a mutated FOXP2 genes (every gene would be in every neuron, in fact in every cell), but not neurons that are involved in learning songs. But other neurons learn just fine.

What would distinguish a song-learning neuron from a muscle-coordination learning neuron? I don't know.

As is typical with the New York Times, they want to keep you in their ad ghetto, so they provide no link to the research report, but they say it is in the current issue of Nature. Here is the link: The genome of a songbird

Bitworm HTM Example Program, Part 3: Spatial and Temporal Pool Overview

2010-04-05T18:17:00.000-07:00

In Understanding Bitworm Part 2 I wrote: "Among the other parameters of CreateNode you can see spatialPoolerAlgorithm and temporalPoolerAlgorithm. I don't think I having used "pooling" yet. Remember I wrote about quantization points? [See How do HTMs Learn?] There are a number of available points both for spatial and temporal patterns in the unsupervised nodes. They need to be populated, and they may change during the learning phase. Pooling appears to be NuSpeak for this process; a pooler algorithm is the code that matches up incoming data to quantization points."

To learn about the pooling algorithms I went to the Numenta Node Algorithm Guide, which is not at the Numenta web site, but installs with NuPIC under \Program Files\Numenta\nupic-1.7.1\share\doc\NodeAlgorithmsGuide.pdf.

There are two node types implementa the NuPIC learning algorithms:

SpatialPoolerNode
TemporalPoolerNode

Some confusion might exist because in more general Numenta discusions a node is treated as a single entity, but both the spatial and the temporal node are needed to create a functioning general node. When the unsupervised node in Bitworm is created with CreateNode(Zeta1Node,...), in effect both a SpatialPoolerNode and a TemporalPoolerNode are created to get full functionality. They refer to both node types being in the same level of the HTM hierarchy. But with you can design more complicated patterns by arranging SpatialPoolerNode and TemporalPoolerNode in an HTM as needed, rather than always pairing them on a level.

"Spatial pooling can be thought of as a quantization process that maps a potentially infinite number of input patters to a finite number of quantization centers." Which in other lit Numenta calls quantization points. Data, in our HTM world, has a spatial aspect. This might not be change along a spatial dimension; space has a more general sense. For instance, the space might be a range of voltages, or sets of voltages from an EKG, for instance. Spatial data usually varies so complexly that we are only interested in the data that is created by objects, or causes. Spatial pooling groups the data into a limited number of causes (or guesses about causes).

Temporal pooling does the same thing with the patterns (objects) identified by the spatial pooler over time sequences. "If pattern A is frequently followed by pattern B, the temporal pooler can assign them to the same group."

A group of nodes forming an HTM level may be able to form invariant representations of objects by combining spatial and temporal pooling. If it can, it passes these representation up the hierarchy.

Once learning is achieved the nodes can be used for inference: they can identify new data as containing patterns that have already been learned.

For now I will focus on the learning phase, since the inference phase is relatively easy to understand if you understand how learning takes place.

SpatialPoolerNode

I just realized the paper I am reading does not actually give the algorithms used. However, the key algorithm is probably related to the maxDistance parameter. Distance here could be ordinary distance, but it is more likely to be distance within a generalized, possible many-dimensional, heterogeneous pattern space. All kinds of problems leap to mind for writing such a generalized algorithm. I would bet that space/data specific algorithms would really help here (sound vs. sight vs. spatial orientation of human fingers), but perhaps if the quantification is always done before the data is fed in, it is just a matter of matching numbers. Anyway, if you have a distance function, you can group the spatial patterns as falling around a set of centers. These centers are your quantization points. As discussed elsewhere these points are flexible; if a lot of patterns fall close to each other, you might want to tighten up the distance parameter because otherwise you don't use all your allocation of quanization points. That should happen automatically, but either it doesn't, so you need to set the maxDistance parameter, or it does but you still have the option of disagreeing with the automatic or default settings.

Your number of quantization points is set by maxCoincidenceCount. "Storing too few coincidence patterns can result in loss of accuracy due to loss of information. Storing too many coincidence patterns can result in lower generalization and longer training times."

You can also set the sigma parameter. Here's another insight into the algorithm: "each input pattern is compared to the stroed patterns assuming that the stored patterns are centers of radial basis functions with Gaussian tuning. The sigma parameter specifies the standard deviation of the Gaussian [distribution]." So this would work, along with maxDistance, in matching incoming data patterns to existing quantization points.

The clonedNodes parameter allows a set of spatial nodes to use the same coincidence patterns. This allows all the nodes in a level to detect the same causes. In vision that could be moving lines, spots, etc.

The spatial pooler nodes take inputs with the bottomUpIn parameter. The spatial pattern outputs in inference mode are in bottomUpOut; outputs in learning mode go to a temporal pooler.

TemporalPoolerNode

Temporal pooling has more options than spatial pooling, in particular offering parameters for both first-order and higher-order learning.

Your number of temporal groups, or time quantization points, is set by requentedGroupCount.

You can select a variety of algothims to use to compute output probabilities with the temporalPoolerAlgorithm parameter, but it has no impact on the learning algorithm.

There are a number of sequencer parameters that allow control of the of the algorithm. sequencerWindowCount allows for multiple stages of discovery (the default is 10). sequencerWindowLength allows segmentation of the input sequence to look for patterns. sequencerModelComplexity apparently allows you to adjust for how the recognizable patterns are balanced between the spatial and temporal dimensions. Some objects produce mainly spatial patterns, others mainly temporal, and most combine the two to a greater degree.

As with SpatialPoolerNode, you can clone the nodes if you desire. bottomUpIn takes the data in from one or more spatial pooler nodes. bottomUpOut is the resulting vector of real numbers representing "the likelihood that the input belongs to each of the temporal groups of this node."

In addition to parameters, TemporalPoolerNode takes a command: predict, but it works only in inference mode.

Conclusion

Despite not revealing the details of the algorithms, the Guide, plus the previous materials I read, gave me a good overview of what the algorithms need to achieve. I am pretty sure that I would write algorithms that do approximately what the Numenta pooling algorithms do, but since they have been playing with this for years, I would rather catch up by examinging the code inside the Numenta classes.

See also: More on Internal Operations of Nodes

Understanding the Bitworm NuPIC HTM Example Program, Part 2: Network Creation Overview

2010-03-31T18:24:00.000-07:00

Now that Bitworm is running (See Bitworm Part 1), there are a variety of options. In the Getting Started document the next steps are funning Bitworm with "temporally incoherent data" and then with noisy data. We could go to the data generation functions and play with them, then see how Bitworm reacts. I am more interested in how the network is created, and how it functions internally. An overview of this is covered in "Creating the Untrained HTM Network File," (starting page 21 of Getting Started).

One thing I found helpful is looking at the set of programs in \Numenta\nupic-1.71\share\projects\bitworm\runtimeNetwork\. These include what appears to be an older version of RunOnce.py that uses CreateNetwork.py for network creation. In the "plain" version of RunOnce the network creation segment has just four lines of code:

bitNet = Network()
AddSensor(bitNet, featureVectorLength = inputSize)
AddZeta1Level(bitNet, numNodes = 1)
AddClassifierNode(bitNet, numCategories = 2)

AddSensor(), AddZeta1Level(), and AddClassifier() are imported functions from nupic.network.helpers. They don't seem to be used other than for Bitworm, so they are worth discussing only in the context of understanding the node structure of Bitworm. This network appears to have 4 nodes in the Getting Started (page 22) illustration, but in CreateNetwork.py we find five listed: the sensor node, the category sensor node, an unsupervised node, a supervised node, and an effector node. Getting Started calls 3 of the nodes the same, but instead of supervised and unsupervised, refers to bottom-level and top-level nodes.

Jumping ahead in Getting Started, we find that bitNet = Network() does indeed create an HTM instance that nodes can be added to and arranged in.

The runtime version replaces these with a single command (but a lot more parameters):

createNetwork(untrainedNetwork = untrainedNetwork,
inputSize = inputSize,
maxDistance = maxDistance,
topNeighbors = topNeighbors,
maxGroups = maxGroups)

CreateNetwork.py can also be found in the runtime directory. Open it and the first thing you see
CreateNetwork starts by importing nupic.network. So there is a set of one or more functions or classes we can use to get an overview; we'll look inside them later, if necessary. The following line of code gives us our function parameters, some of which are set specifically for Bitworm. So CreateNetwork.py is not a general-purpose HTM creation function.

def createNetwork(untrainedNetwork,
inputSize = 16,
maxDistance = 0.0,
topNeighbors = 3,
maxGroups = 8):

Next we have some agreement with the plain RunOnce.py:

net = Network()

Network() is an imported function that creates the overall data structure for the HTM.

Nodes are created with the CreateNode() function. The type of node - sensor, category sensor, unsupervised (Zeta1Nodes), supervised (Zeta1TopNodes), and effectors - is chosen with the first parameter of CreateNode(). Among the other parameters of CreateNode you can see spatialPoolerAlgorithm and temporalPoolerAlgorithm. I don't think I having used "pooling" yet. Remember I wrote about quantization points? [See How do HTMs Learn?] There are a number of available points both for spatial and temporal patterns in the unsupervised nodes. They need to be populated, and they may change during the learning phase. Pooling appears to be NuSpeak for this process; a pooler algorithm is the code that matches up incoming data to quantization points.

I did not get as far as I would have liked today, but I am beginning to see some structure, and dinner is calling. Instead of calling this entry HTM Creation Classes and Functions, I'll call it an Overview.