Machine Understanding

Monday, August 20, 2012

ApeWorm Brain Overview

(part of the series Aping As the Basis of Intelligence)

Figure 2 gives an overview of the ApeWorm "central nervous system" or control system.

Two eyes or retinas gather information about the exemplar that is analyzed to form an exemplar inner model. The ApeWorm also analyzes it internal sensors to construct a self inner model, or self image. If they match, the Ape control function registers success and indicates to the muscle control functions to stop movement. It also may reinforce the "synapses" or weights that led to the success, including any temporal sequence involved. If there is No Match, muscle control allows movement to continue.

It should be noted that the compartmentalization of functions in Figure 2 need not necessarily correspond to how the system is constructed. A neural model might make it possible, or even necessary, to tightly integrate two or more, or even all, of the functions that seem separate in this analysis.

Wednesday, July 11, 2012

Constructing the ApeWorm World

In this draft I will try to avoid using any particular programming language to illustrate the mechanisms of action. If necessary, I will use pseudocode. When the analysis is finished and it is time to try to run simulations I will try to remember to insert links to the code samples.

The ApeWorm world (or dual worlds) does not need to be defined explicitly. The upper and lower limits for the ApeWorm coordinates will suffice. As shown in Figure 1, the allowable coordinates run from 0 to 4 on both axes.

Figure 1, ApeWorm

Figure 1

ApeWorms themselves have 4 segments, but require five coordinate points to describe fully. Each point has an x1 component and an x2 component. The points will be called A0(x1,x2), A1(x1,x2) , A2(x1,x2) , A3(x1,x2) , and A4(x1,x2).

The data we gods will use to track ApeWorms are not the same data that they keep track of themselves. An ApeWorm will know where its head, or segment one, is on the world grid. In other words, it will know A0(x1,x2) and A1(x1,x2). Each intersegment node (joint) will be able to convey to the control system one of three states: left, center, or right (L, C, R). Handedness will be determined from the higher numbered segment. In other words for the 1 joint, we look towards the 1 segment from the 2 segment. Thus in the curled ApeWorm example in Figure 1 the joints are all in the right or R configuration.

The joints are controlled by sets of two opposing virtual muscles. When there is no signal to a muscle, it is relaxed, and it contracts as signals increase. An algorithm will determine which of the three states the joint is in based on the relative strengths of the control signals to the muscles.

The ApeWorm's ability to "view" another ApeWorm is stereoscopic but otherwise simple. The virtual retinas coincide with the x1 and x2 axes. Each segment of these axes can only "see" a segment in its row or column, and if there are two segments in a column can only see the closest one. [But a second segment in a column might be inferred by the control system.] The sensor can see the segment number, if any, and can see the distance, or what cross-coordinate the segment lies on. Thus in the upper-left example in Figure 1, the sensors on the x1 (horizontal) axis will record nothing. The x2 sensor closest to the origin will record a segment 4 that is at x1 = 1. The x2 sensor between 1 and 2 will record a segment 3 also at x1 = 2, etc.

Next: ApeWorm Brain Overview (to be constructed)

Tuesday, July 10, 2012

ApeWorm, an Initial Machine Ape Project

The core of the system we seek is quite complex. We would like to minimize the inputs and outputs in order to focus our first project on the core aping system. At the same time, we want a general solution that is scalable to projects with larger, more complex input/output and memory requirements.

We will call our first attempt ApeWorms. These creature constructs will exist on a 5 x 5 grid and will have, for bodies, four connected line segments, the ends of which line up on the grid. Any two adjacent line segments will be able to have three relative positions: left, straight, and right {L, S, R}. An ApeWorm cannot have two segments overlapping each other. Figure 1 shows a few possible ApeWorm configurations.

Figure 1, ApeWorm

Figure 1

While ApeWorms are physically simple, their neural networks, or control software, can be as complex as we need. What should an ApeWorm know about itself, and what should it be allowed to know about an exemplar ApeWorm, if aping it is to be a non-trivial task requiring an extensible aping machine?

The ApeWorm will have one end designated the zero {0} end, with joints {1, 2, 3} and a {4} end. Each segment will be designated by its terminal joint, {1, 2, 3, 4}. Each joint will signal its relative position, going from the lower to the higher numbers, as L, S, or R (after training). The zero end and 1 joint will be able to sense their positions in the grid, again after training. For illustration purposes joints may be color designated: 1. Black 2. Green 3. Blue 4. Red. [Training adds a degree of complexity, which I suspect will be important as we move to systems more complex than ApeWorm]

How would the ApeWorm know about its exemplar? Note that it would be a trivial task, if exact information about the exemplar is known, to assign that information to the ApeWorm directly in software. But if we built two ApeWorms in the flesh, so to speak, we should have autonomous systems that require such information to cross space, so that information about position becomes encoded in some other form.

We will allow the ApeWorm to have two "eyes" on the Exemplar worm. The eyes (retinas) will be at the bottom and left sides of the grid. Each eye can detect segments that are broadside to it, and their orientation by row or column and by distance from the eye.

It is tempting to ignore the temporal aspect in the first build of ApeWorm. It seems difficult enough to construct a general aping ability when the minimum requirement is to detect the position of the Exemplar worm and then provide for the ApeWorm to conform itself to a copy of that. However, going forward we will want temporal input and temporal aping.

Consider that the aping instinct evolved in animals only after basic lower functions were in place. An animal has to sense the environment, coordinate itself, catch food and escape predators just to survive. All that puts a high premium on temporal capabilities. Aping sits atop all those capabilities, just as language and culture sit atop aping. We have freed our ApeWorms from the need to eat, but we run the danger of making our aping system hard to integrate into robots and more complex programs if it is unable to at least deal with time sequences.

Next: Constructing the ApeWorm World

Monday, June 11, 2012

Specifications for the Language Machine

Aping as the Basis of Intelligence (cont.)

Specifications for the Language Machine

Typically in systems of artificial intelligence designed for language there is a front-end feature detection system. Thus the slight fluctuations in air pressure we call sound are analyzed for features. In the case of human language these features are often quite complex, but at this point they are well-studied. Thus detectors have been devised for common syllables and voice ranges.

In a developing human there are likely some very generalized feature detectors, but they are also very flexible. This would also be true in mammals and birds that have shown they can learn some human words. Thus a human baby can learn a primitive, click-based tongue from Africa, the simple syllables of modern English, or a tone-based Asian language system. In effect feature detectors evolve based on exposure to language.

Voicing is also complex, controlled by a wide range of muscles. It too is learned, and requires considerable practice to achieve perfection. Aping the voices of other humans is the primary method of learning to speak so as to be understood.

Four major input/output streams can be defined for a human-like language machine. There is the audio input from the ears. There is output to a variety of muscles that produce sounds and speech. There are other inputs ultimately external to the body, necessary to provide positive and negative behavior reinforcement, such as touch. There are internal desire (or rejection) type inputs, notably hunger and other discomforts or wants. There is also a need for decision making: given all the other inputs, deciding what sounds to make and when. This decision making could be incorporated into the language machine or it could be external, and probably is some combination of both in humans.

Tuesday, June 5, 2012

Aping as the Basis of Intelligence

Logic was thought to be the key paradigm for human intelligence from the Aristotle to Wittgenstein, and then in the early days of Artificial Intelligence as implemented on computers.

Careful consideration shows that aping, the ability to copy an action of another individual, is more central to intelligence. It is as much a mammal trait as a human one. Formal exercises in logic are not needed for one animal to learn behavior from another. The complex neural circuitry created by evolutionary pressures to enhance aping potential eventually became the basis of human language and its formalized reasoning skills known as logic.

Aping consists of observing (usually visually) what some other individual is doing, and then copying that behavior. Humans do this so naturally that we seldom think about it consciously. Unless trained to do so (by aping others), practicing logic is a much harder skill. Yet logic circuits are easy to implement with electronic parts. Aping another individual requires very complex systems within the brain.

This essay, and the accompanying computer programs and physical systems, is an attempt to create machines capable of aping, which then should be able to handle other complex behaviors usually associated with human intelligence, including language, mechanical skills, reasoning, and possibly consciousness.

The advantage of aping skills goes beyond helping to ensure the survival of an individual animal. With aping a behavior with survival benefits can become cultural, spanning an unlimited number of generations of animals. This without any genetic change specific to the behavior itself. A generalized pool of neurons can become capable of generating a wide and flexible variety of behaviors, giving a species a heavy advantage over animals that can only engage in behavior based on pre-programmed neural systems.

Pre-language example. Consider a simple aping process and how it differs from paradigms like ordinary remembered (learned) behavior. An animal, a young wolf, has always simply crossed a cold stream by simply plunging through it. One day it sees an older wolf stop at the stream and proceed along the bank to a point where there is a rock in the center of the stream. The older wolf jumps to the rock, then to the far bank, thereby avoiding getting wet while crossing. The young wolf then apes that behavior. Afterwards, when crossing the stream in that area (when not in hot pursuit of prey) the young wolf diverts to the special crossing point.

Note how the aping behavior differs from what might be other reasons for using the rock to cross. The young wolf could learn of the rock crossing by exploration, then would remember the location of the crossing when appropriate. You could argue that following an older animal might be an instinct, so this is not a real example of aping, just a side effect of following.

Aping behavior might have originated in following behavior, and the basic ability to learn from experience, but as more complex situations are involved, it becomes clear that aping is a special case requiring special capabilities.

Supposing this simple example constitutes a form of aping, what can we say about how it is accomplished by the wolf brain? The wolf is capable of recognizing an externalized self. The example wolf, to the aping wolf, represents a possibility for itself. Metaphorically, the aping wolf can see itself crossing the stream using the rock when it sees the example wolf. So the wolf-brain is capable of contemplating (in a very simple, non-philosophic way) the future. It can choose to ape the example, or it can refuse to cross the stream, or it can ignore the example and get wet and cold crossing the stream.

Aping, in the infants of actual apes including humans, is probably automatic. Only at a later time will the young apes start choosing whether or not to ape particular behaviors. There is typically outside support for aping in human children. We reward them when we like their aping efforts, but may discourage them when we don't like their efforts (as when they engage in behaviors culturally reserved for adults).

Consider the "simple" modern act of making a piece of toast with a bread toasting appliance. A child is not likely to learn to do this by accidentally taking a slice of bread, sticking it into the appliance, and pushing down the lever. A child learns this by watching a example or exemplar. The child understands that doing what an exemplar does leads to the same result. The child sees that taking a piece of bread, placing it in a certain way in the appliance, and pressing the lever results in the making of toast. Usually, if sufficiently coordinated, a child can do this on the first try. Aping often results in some level of success in achieving a goal on the first try.

To ape an exemplar the aping child must already have similar levels of muscle control and sense of where its body is in relation to its surroundings. It must understand that aping the exemplar's movements (or other behaviors like vocalizing) achieves a desired result. If an aping attempt is made that does not achieve the desired results, a child may conclude that something is missing from the aping process, a subtlety or trick.
To ape, the brain must have a number of capabilities, which might be summed up as the ability to coordinate the body of the subject with the body of the exemplar.

In humans, when children are quick to ape adults, we call them intelligent. If they are slow to imitate adult behaviors, we call the slow or less intelligent.

Language learning example. Human babies start learning to imitate vocalizations soon after birth. Vocal fussing often results in rewards, like feeding, encouraging further vocalizations. The process may seem like a long one when mechanical devices have long existed that can record and play back sounds and language. There is a fundamental difference between the mechanical play back model and the aping vocalization model that is critical to the reconstruction of how humans are capable of intelligent behavior, including understanding how the world works. [I prefer Machine Understanding as a term for what we are working towards with computerized robots capable of aping, because Artificial Intelligence has largely ignored a set of crucial problems that logic circuits have difficulty handling].

The human brain does not appear to be a recording medium in the sense that a phonograph, magnetic tape machine, or silicon-memory based sound recording and playing device is. The human ability to immediately repeat a phrase of language or hum a melody does not arise from a like mechanism. Humans have fairly good memories, but a weak ability to memorize.

Consider the problem, for the human brain, of learning to repeat its first few spoken words, say "mama, papa, no, puppy, bye, milk." In the ear there are a number of sensory cells that respond to different frequencies of sound. They send nerve impulses towards the brain. At the other end we have the sound making machinery: lungs, vocal cords, and mouth. An infant screaming soon after birth indicates that machinery works, but is nothing like language. Months of hearing and experimental vocalizations follow. There may be feedback, positive or negative, by parents and other exemplars, as well as careful repetition of simple words in the hope this will induce aping by the child.

In the end, however, success is the ability of the child to say a word that sounds, to its own ear, like the kind of words spoken by the exemplars. It will not be identical, nor will it ever be identical, in its fine structure, in fact it will be identifiably different, an individual voice. Six words mastered, many more will follow, all aped after exemplars. In addition, the words will have meaning: they will coordinate with objects or actions in the external world (or internal world, as with "more" or "hurts").

The neural machine that coordinates heard words and the speaking of words with an "ego" must be very complex and very capable. Recreate that machine and we would have a model that, perhaps with some specialized variations, would be able to ape visual input to body positions, or anything else required of it.

Sunday, January 22, 2012

Predictive Memory and Particle Filters

I keep thinking about the Particle Filters AI method, so might as well right some of it down. Note that in AI Particle Filters has a particular meaning, different from the physics for particles in a real-world simulation or in a video game. In AI filtering is a method for localization of an object, which could be a large physical object like a robot. It involves probabilistic methods (Monte Carlo techniques) for predicting where the object might be, combined with sensory feedback. Typically it requires a pre-existing map of the territory (which for a robot is real, but which could be highly abstract in other cases) and a method for matching the feedback to the map. It has been most notably used as an essential software component of the Google self-driving car. A somewhat more extensive overview can be found at Particle Filters.

The reason I keep thinking about it is because I started noticing how I consciously (and probably subconsciously) localize myself in the physical world. Most notably, when I am navigating in the dark in my house. I do this to avoid backtracking: turning on a light, going back to the prior switch & turning it off, etc. In any case I know my house well, but in just a few feet in the dark it is easy to get a bit off course and bump into furniture or a wall where I expected a doorway. The stairs can be tricky too.

So I walk a few feet in what I guess is the right direction based on experience. Then I reach out to touch what I expect is a wall or piece of furniture. Or I note a change from rug to bare floor, or spot an LED. Usually I get the feedback I expect, but sometimes I find I am off course and need to make a correction.

In Particle Filter Localization terms, I project a probabilistic range of particles representing where I, the human robot, might be next. Then I check. Often I know exactly where I am after a check, but sometimes all I know is that I am not near a wall or furniture piece. Then I make another mental projection, take a step or two depending on my expectations and level of caution, and try again to touch something. I go through similar mental gymnastics when driving a car or taking a daylight walk, but the feedback is visual rather than touch.

I went through a similar analysis process after reading Jeff Hawkins theory of predictive memory in On Intelligence. The theory is that the brain largely operates by making constant predictions, based on memory of historical patterns, and checking them against sensory input. When input varies from predictions, neurons send special signals about detecting novelty to higher brain centers that deal with novelty. For instance, as you read this your brain knows a number of words might be included in this zebra. See, you expected "sentence" or maybe "essay" or the like, you were not expecting "zebra", and so the normal flow of processing was interrupted.

I have not seen a neural network model for particle filters. It could be that the resemblance between how human brains seem to work is only superficially like particle filters. More generally we are talking about feedback systems, which can take a number of forms. Even very primitive animals that lack differentiated neurons have feedback systems, so there could be many types of neural feedback schemes operating in the human brain.

I often notice my dog doing something that makes me think that most mammals, including pre-sapiens homo species, can do that. Yet when these processes, say chasing a rabbit, are abstracted, they can become quite complex. I can imagine chasing a rabbit. I can plan to chase a rabbit into a trap. I can pursue something more abstract, say a solution to a construction problem or even a math problem in a way that resembles chasing a rabbit. Let's see, the answer to the solution of this differential equation has disappeared, it is not obvious. I'll beat the bushes, using three techniques known to help solve this type of equation, most-likely technique first. Hopefully one technique will put the problem in a form that I recognize from the standard map for solving such equations.

How would particle filters be implemented with neural networks? Let me know your thoughts.

Monday, December 26, 2011

Stanford AI Class wrap up

I managed to muddle my way through the free Internet version of the Stanford Introduction to Artificial Intelligence (AI) course. "Congratulations! You have successfully completed the Advanced Track of Introduction to Artificial Intelligence ..." says my Statement of Accomplishment.

Before putting on my analyst mask, I would like to thank Stanford University and particularly the instructors, Sebastian Thrun and Peter Norvig, for conducting the course. In particular, for making it free of charge. I am hoping they will leave the instruction videos up for a while, there are some I would like to go over again.

I got a good review of Bayes Rule, basic probability, and some simple machine learning algorithms. I had not worked on Planning algorithms before, so that was of some interest. Markov Models had always been a bit vague to me, so that section helped me nail down the idea. Games and game theory seem to have made little progress since the 1940's, but I guess they have to be covered, and I did get clear on how MinMax works. Computer vision seemed kind of weak, but then you can't assume students know basic optics and at least we learned how to recognize simple features. Robotics was a prior interest for me, and I did not know about Thrun's obvious favorite, Particle Filters, which are a useful paradigm for spatial positioning (aka localization).

The last two units were on Natural Language Processing, and that is a good place to start a critique (keeping in mind that all this material was introductory). Apparently you can do a lot of tricks processing language, both in the form of sounds/speech and written text, without the algorithms understanding anything. They showed a way to do pretty decent inter-ethnic language translations, but the usefulness depends on humans being able to understand at least one language.

Plenty of humans do plenty of things, including paid work, without understanding what they are doing. I suppose that could be called a form of artificial intelligence. Pay them and feed them and they'll keep up those activities. But when people do things without understanding (I am pretty sure some of my math teachers fell into that category), danger lurks.

The Google Car that drives itself around San Francisco (just like Science Fiction!) just demonstrates that driving a Porsche proves little about your intelligence capabilities. Robot auto-driving was a difficult problem for human engineers to solve. They were able to solve it because they understood a whole lotta stuff. Particle Filters, which involve probability techniques combined with sensory feedback to map and navigate an environment, are a cool part of the solution. If I say "I understand now: I have been walking through a structure, and to get to the kitchen I just turn left at the Picasso reproduction," I may be using the word understand in a way that compares well with what we call the AI capabilities of the Google Car. Still, I don't think the Car meets my criteria for machine understanding. The car might even translate from French to English for its human cargo, but I still classify it as dumb as a brick.

Hurray! Despite my advancing age, lack of a PhD., less than brilliant business model, and tendency to be interested in too many different things to be successful in this age of specialization, no one seems to have gotten to the essence of how humans understand, and are aware of, the world and themselves.

If the human brain, or its neural network subcomponents, did Particle Filters, how would that work? I know from much practice that bumping around in the dark can lead to orientation or disorientation, depending on circumstances. On the other hand the random micro-fine movements of the eye might be a physical way of generating randomness to test micro-hypotheses that we are not normally consciously aware of.

We sometimes say (hear Wittgenstein in my voice) that someone has a shallow understanding of something. "Smart enough to add columns of numbers, not smart enough to do accounting," or "Good at game basics, but unable to make strategic decisions." Let me put it another way: in some ways the course itself was an intelligence test. I imagine it would be very rough for anyone without a background in algebra and basic probability theory. The students in the class already knew a lot, and had to learn difficult things.

I want to know how our bodies, our brains, learn difficult things. The only way I will be able to be sure that I understand how that is done is if I can build a machine that can do the same thing.