Learning and Cognition

Barry Sinervo©1997

Index

Non-associative learning: Habituation and sensitization

Associative Learning

Classical Conditioning

Instrument Conditioning

Biological Constraints on Learning

Examples of Learning in "Natural Contexts"

Adequacy of Associative Learning Theories and Complex Learning

Cognitive Views of Behavior

Optimal Foraging Theory and Learning

Can animals learn in an optimal way?

Risk Aversive Bees Revisited and Memory Windows

Learning, Behaviorism, and Cognition

Are some behaviors the result of innate programming of genes?

Do animals show intelligence?

Is the animal born a tabula rasa and through the experiences of a lifetime its behaviors are shaped by conditioning and interaction with its environment?

These questions span the breadth of approaches and ideas concerning the source of animal behaviors. We have already seen clear cases of innate programming or hard-wiring of the nervous system. Complex behaviors are the result of relatively simple neural circuits (fixed action patterns) that are triggered by sign stimuli. In contrast, imprinting reflects a learning process that takes place within a very short period, but this learning of mother by offspring undoubtedly has some hard-wired circuits devoted to the process. Such a view cannot explain all cases of behavior. It is not just nature, but nurture, or more properly it is not just genes, but the interaction between genes, the emergent properties of genes (epigentics), and the interaction of genes with the environment. In this chapter, we will explore the higher-order interactions underlying behaviors -- learning and cognition.

The theory of learning maintains that the organism is born with relatively flexible neural circuits, but it such circuits have the capacity to be programmed by learning. The circuits become conditioned through trial and error associations. Finally, some behaviors could be the result of this learning and conditioning process, whereas others might be the result of true intelligence or at the very least, cognition. At the highest level humans use reasoning and abstraction to make decisions. Whether or not some animals are capable of such higher level cognitive processes is not certain. We will explore the distinction between learning and conditioning theories, and cognitive views of behavior.

Processes of learning and cognition are by there very nature performance based. An important aspect to consider in measuring performation is whether or not the animal is motivated to perform a behavior. We must consider motivation in our study of learning and cognition because such an analysis may be confounded by a lack of motivation or differences in states of arousal between the subjects.

At another level, proximate causal mechanisms underlying motivation may also explain differences in the behavior of individuals in the wild. For example, the a subordinate in a troop of baboons is supressed from engaging in copulation, and such supression may be because an important causal agent, testosterone, is at lower levels or perhaps because corticosterone, a stress hormone is at higher levels in the subordinates. In contrast, levels of testosterone may be at very high levels in a dominant. A dominant has the motivational state to engage in copulations with most receptive females.

Finally, the proximate causal mechanisms underlying both motivation and learning may also serve us with powerful explanations of differences in behavior between organisms. For example, song bird male and females differ in the capacity to learn song. Such constraints on learning arise from the basic neural architecture of songbirds. In cases where female song may be important, for example in the formation of a pair bond in a monogamous species, the regions of the brain are elaborated by natural selection and such constraints do not hamper the learning of song in female birds.

Index

Motivation

The study of motivation entails the study of cause and effect. Behaviorists are interested in the causes of certain behaviors, and the most proximate of all causes for a specific behavior would be the stimuli from the external environment. The second aspect in the causal chain underlying a behavior is some index of the internal state of the organism that motivates the underlying behavior. In many cases, the stimuli are present in abundance, and yet the animal does not respond to the stimuli all the time. For example, food may be present a lot of the time, but an animal is not constantly eating in the presence of food. Internal factors control the level of satiation in the case of feeding and such satiation mechanisms provide negative regulation on the feeding behaviors.

Of all the internal mechanisms studied, hormones are the most readily understood aspects of an animals physiology that provide positive and negative regulation on an animals behaviors. In addition, endocrine systems controlling a behavior can be removed and the negative regulation or positive regulation can be interupted. Such experiments allow one to address the cause and effect relations between the proximate mechanisms underlying motivation. Equally important in such an endeavour is our ability to measure natural levels of hormones in organisms and correlate such changes with changes in an animals motivation.

For example, one can remove the endocrine organ responsible for testosterone production, the testes, and inject known quantities of testosterone to study how plasma testosterone governs the response of males to aggressive (presence of rival males) or sexual stimuli (presence of receptive females). Testosterone is present over a long time course, and can be considered a general factor governing arousal in animal, but the presence of a male or female can be clearly identified as the external stimuli eliciting the behavior given that testosterone is present to allow for the underlying motivational state to be reached.

Hormones do not necessarily have to be present for long periods to alter an animals motivational state. For example, the "flight-or-fight" hormone, adrenalin, affects the motivational state of an animal in minutes or even seconds. Further actions by the stimulus that initially caused high levels of adrenalin production then are used to push an animal down one of the two alternatives: flight or fight.

The links between negative and positive feedback in the nervous system are also fairly well understood. Many protein hormones governing the reproductive cycle are produced by the pituitary or hypothalmus which has direct innervation from the brain. Clearly the nervous system also plays a central role in effecting behaviors that are elicited by external stimuli.

Index

Learning

Habituation and Sensitization: Non-associative learning

Habituation forms the simplest kinds of learning processes. Animals respond to many aversive stimuli such as being touched by a foreign object by recoiling or retracting. For example, a snail will retract into its shell in response to being touched by a probe. However, continued touching by a probe will cause the snail to become habituated and eventually the snail will stop retracting into its shell. In many cases the neural correlates of such responses can be clearly identified.

Another example of habituation that we have already considered is input filtering. Anolis lizards become habituated to sinusoidal movements quite readily and such habituation usually takes place in a two to three waves of stimuli. Input filterning and the more general process of habituation allows the animal to stop responding to inappropriate or noisy stimuli.

Sensitization is the flip-side of habituation. In the case of sensitization, response to external stimuli appears to be elevated as the trial progresses rather than depressed as in the case of habituation. For example, the common octopus can be trained to attack a target at the end of tank. Such a response can be increased after an individual has been fed. They attack more readily in this altered physiological state or they are sensitized to the presentation of the neutral target after feeding.

Associative Learning

Associative learning takes place whenever an animal learns to associate an external event with a change in its own internal state, or a change in its behavior. In addition, an animal can also learn to associate an act that it performs with some kind of reward. The first form of learning is classical conditioning, the second form of learning is operant or instrument conditioning.

Classical Conditioning

The study of learning has been dominated by the paradigm of associative learning. For example, in classical conditioning, an animal is presented with an "neutral" environmental stimulus, such as the ringing of a bell, and follows up on this stimulus with a motivationally significant event such as feeding. The reader is undoubtedly familiar with this example in which Pavlov conditioned dogs to begin salivating in response to a buzzer, even before they were presented with the food.

What happens during the classical conditioning process?

Why does the dog begin salivating in response to a neutral stimulus?

Pavlov argued that food is natural or unconditional stimulus (US) that triggers a reflex response in the form of salivation, which is likewise the unconditioned response (UR). Conditioning leads to the conditioned stimulus (CS), a buzzer, becoming associated with the reflex or unconditioned response. Pavlov believed that the CS may come to substitute for the US in triggering the UR. Indeed the response of the dog includes more than just reflex action, the dog becomes excited, approaches the food dish (or site of feeding). The example of Pavlovian Conditioning is one in which the US is positive, however, negative stimuli such as an electric shock could also be used.

How can such learning be viewed in an adaptive context. Processes underlying classical conditioning allow an animal to make associations between the external environment, and then prepare for the upcoming event that is triggered by the predictor. Developing such correlations between external events permits the animal to make adaptive modifications of behavior.

Index

Instrument Conditioning

Instrument conditioning was popularized with the invention of the Skinner Box, by B. F. Skinner. Instrument Conditioning differs from Classical Conditioning in that the motivationally significant event occurs after the subject performs some behavior rather than performing some behavior in response to a stimulus. The rat learns to press a bar and then it will be fed a pellet of food. Instrument conditioning is also a form of associative learning in that the rat comes to associate the bar press with the receipt of a reward.

Even though the skinner box may be somewhat artificial, the principles of instrument conditioning have practical value in nature. For example, a squirrel attempting to crack open nuts, learns to open the nuts by a similar process. It modifies its behavior as it finds a successful technique that allows it to access the meat of the nuts faster.

Index

Biological Constraints on Learning

To understand the arguments of nurture and nature we must explore how the ideas of behaviorism and laws of learning became linked:

General Process Theory, and
the Principle of Equipotentiality.

An adherent of general process theory would contend that all examples of associative learning involve the same basic underlying mechanism or process. This allowed learning theorist to pick model systems to study the process of learning per se, and to construct experiments in the laboratory, in which the artificial testing removed any possibility of species specific behaviors to be minimized. The mechanism underlying learning are not the proximate mechanisms of interest that we are used to thinking of such as neural circuitry, but rather the formal rules underlying the learning process.

The principle of equipotentiality implies that all organisms are capable of learning to associate anything. In its most extreme form an organisms life experience are built up on the tabula rasa and they are somehow formed from more and more complex associations. A strict behaviorist would maintain that such experiences are immune to instincts. So began the debate between nurture and nature. Behaviorists on the one hand held this view of experiences. Ethologists found such arguments absurd. Evidence rapidly accumulated that organisms could not learn associations between anything. This evidence seemed to invalidate the principle of equipotentiality, but really had not bearing on the general process theory.

A simple example, should suffice to make these distinctions. Rats are very capable of making associations between taste and nausea (see Alcock for details). However, they are incapable of making assoications between sounds and nausea. Conversely rats are capable of making associations between sounds and electric shock and can learn to crouch when they hear a sound in order to avoid an electric shock, but they do not learn to associate tastes with electric shock.

Organisms are honed by the process of natural selection to learn certain associations quite well because they have survival value. In the case of taste aversion, the rat can make such associations because avoiding foods that cause nausea might have survival value. Likewise avoiding shock by crouching (or not rearing) in the presence of sounds are easily learned because rats naturally associate sounds (e.g., of a predator) with pain that an unsuccessful predation attempt entails. However, associating taste with pain is not normally found in the realm of a rats natural experiences.

Examples of Learning in "Natural Contexts"

Links to examples already considered:

Taste Aversion: The Evolution of Aposematic and Mullerian Mimicry

Search Image: Operant (Instrument) Conditioning of Blue Jays

Neighbor Stranger Recognition in Birds

Deme Recognition: Female Choice for Natal Song

Classic Song Learning

Spatial Mapping: Learning or Instinct of Stellar Navigation?

Index

Adequacy of Associative Learning Theories and Complex Learning

These examples provide clear evidence of some apparently ordinary abilities (e.g., color aversion) and some extraordinary abilities (e.g., star maps) that animals have for learning complex information. Some of these examples fall within the scope of assoicative learning paradigms. However, examples of bird song and spatial learning maps appear to fall outside of this paradigm.

The case of color aversive learning of noxious prey and mullerian mimicry that we considered can be contrasted nicely with the example of taste aversive learning of rats. Birds tend to learn aversion to noxious stimuli through associations with color. In constrast, rats learn aversion to noxious stimuli through associations with taste. Each group operates with a different sensory modality, however it appears that the general process of learning underlying each may be similar. Again this is not to say that the neural mechanisms are the same, but rather the principles of learning are similar. However, each group appears to be constrained in what they can learn with regards to aversive stimuli (e.g., rats do not learn visual cues, and birds tend not learn the the taste cues).

The other examples that we have studied, bird song, does not readily fall into the category of associative learning. In this case, a template of the song is formed in memory, and through a process of rehearsal, the bird learns his species specific song. Acquisition of language in general may involve such rehearsive styles of learning.

The cases of spatial maps indicates that even more complex kinds of learning take place in natural contexts that go beyond the scope of such associative learning experiments. The learning found in stellar maps is not very well understood. However, other forms of spatial learning such as maze running in rats appear to be associated with a specific region of the brain known as the hippocampus. Maze running learning paradigms have been revived in recent years because it appears that rats form cognitive maps of their environment, and that this map is located in the hippocampus.

Index

Cognitive Views of Behavior

A cognitivist view of behavior contends that sensory data are organized by internal mechanisms, and abstracted into a internalized representation of the world. Recall our Leslie Real's definition of a cognitive mechanism that we considered in memory and foraging of honey bees:

perception -- a unit of information from the environment is collected and stored in memory,
data manipulation -- several units of information that are stored in memory are analyzed according to computational rules built into the nervous system,
forming a representation of the environment -- a complete "picture" is formed from the processing of all the information and the organism bases its decision on the complete picture or representation of the environment.

Let us explore this definition in terms of our example of risk aversion in bumble bees:

Perception. The bee drinks nectar and either the time it takes to feed from a flower or how stretched its crop becomes from feeding are fed into memory, along with the color of the flower.

Data manipulation. Based on this single piece of data the bee decides to visit the same color flower or ignore the same flower and by default sample another flower. Thus, the fullness from feeding is used in conjunction with a simple decision, and the avoidance or attraction to flowers results.

The representation of the environment is made in terms of energy content or reward (or risks in reward) and flower color -- the abstraction. The bee somehow forms an association between energy and reward (or risk in the reward) that forms the abstraction of the environment.

Index

Optimal Foraging Theory and Learning

Optimality models assume that animals have evolved the means to behave optimally under a set of constraints. Natural selection has shaped the behavior of animals toward a maximum in energy intake per unit of time and a minimum of costs associated with behaviors. Pure models of optimal foraging assume that the animal is omniscient or that they have knowledge of the evironment in great detail and the choices they make are not constrained by having to sample. However, rarely have optimality models dealt with learning. One reason may be that an animal that is learning cannot be optimal and conversely, if an animal is behaving optimally then it cannot be learning.

But one may argue, couldn't animals LEARN optimally?

If so, what would an optimal learning rule be like?

Can animals learn in an optimal way?

Let us begin with the fundamental assumption of optimal foraging:

Optimal foraging models were erected using the assumption that the forager has omniscient knowledge of the environment, and thus nothing to learn.

However, a more broad concern for foraging theory has the incomplete information about the environment, and with the interaction among foraging and acquisition of information. The main premise of foraging under incomplete information is that acquisition of information for making optimal choices is costly, and must be traded off against current utility.

Under what environmental conditions is learning adaptive, and what learning strategies perform better than inflexible strategies?

The simple answer is that learning cannot be adaptive when there is nothing to learn. for instance, in an environment that is completely invariable over time and space a learning strategy is at a disadvantage. Because of the uniform nature of the environment, there are no choice spots to learn about and an animal can just forage without concern for the "grass being greener on the other side of the fence. In this case, an inflexible strategy that forages at the maximal rate throughout is likely to prevail under natural selection.

This suggests that one environment that will support learning strategies is one that is variable among foraging bouts but predictable within foraging bouts. Thinking in terms of our patch use models of the marginal value theorem, the biggest concern for a forager is to find the right patch among many patches that vary in quality.

Similarly an environment that changes states randomly is also is also not suitable for learning. Any adjustment of the learned foraging parameters according to past experience will not be a useful prediction of the future state of the environment.

This suggests that learning is useful in a somewhat variable environment.

Foraging theorists have addressed the issue of the optimal memory window. This is to say, assume an animal learns the state of its environment as it forages and adjust its foraging behavior according to its experience, how much of its previous experience should be used in calculating the present state of the environment and predicting its future behavior. The optimal memory frame depends on the pattern of variability in the environment. A very short memory window is expected in environments which have high values of autocorrelation - that is environments with a high but not random probability of change. Animals chunk up their world into memory. One cannot remember everything so what is the most an organisms should remember so that it behaves optimally.

Index

Risk Aversive Bees Revisited and Memory Windows

Recall our example of risk aversion in bumble bees. Risky colored flowers had a single high yielding flower for every two zero yield flowers. Non-varying flowers provide 2 microliters in every flower. Foraging bumblebees appear to use a short memory window for making floral choices, and the spatial pattern of nectar distribution may often be highly spatially autocorrelated. High spatial autocorrelation would imply that a single flower on the same plant is likely to be representative of all flowers on that plant. If the bumble bee samples the flower and detects low nectar then it might associate the color of that flower with low reward, and avoid such flowers in the future. It will then focus its energy on the other kind of flower which is more stable in reward.

Let us review the ideas of memory window size for bumble bees:

Lets think of a simple model for memory and how it might influence foraging. Imagine that a Bumble Bee has a specified list of items in its memory that describe the profitability for items during bouts of foraging. It has also been termed a memory window.

E1/T1 (last item encountered)

E2/T2 (second last item encountered)

E3/T3 (etc)....

Recall that if a bumble bee can only remember the last item upon which it foraged (e.g., E1/T1) then a very short memory might easily explain their behavior.

Odds are 2:1 that a bumble bee will encounter 0 when feeding on the variable reward flower. Thus, it would tend to avoid such flowers if the rule the decision to visit the next flower color is based upon the last flower encountered. Thus, a Bumble Bee feeding on the non-varying flower would perceive that it was feeding on the flower with the highest reward. It would then tend to seek out flowers with the same color. Thus under the constraint of a 1 memory item window, the bee would tend to avoid variable flowers, even though they give the same reward as non-variable flowers.

Now let us ask how large the memory window for a bee would have to be for it to forage optimally under the conditions of risk aversion.

Consider a memory window of 2 items and the following rule (think like a bee):

If I have two of the same items in memory, let me average their values and decide on whether to visit those flowers:

(reward on 1st + reward on 2nd)/2.

What if the bee visits a full blue flower (6) and an empty blue flower (0), its average reward is calculated for all possible ways that the bee can "smell the rose" so to speak is given by:

(6+0)/2 = 3 (with probability 1/3*2/3 = 2/9)

it could also visit two fulls in a row:

(6+6)/2 = 6 (with probability 1/3*1/3 = 1/9)

or two emptys in a row:

(0+0)/2 = 0 (with probability 2/3*2/3 = 4/9)

or an empty then a full:

(0+6)/2 = 3 (with probability 2/3*1/3 = 2/9)

The average bee would see the following average reward if it remembered two flowers:

3*2/9 + 6*1/9 + 0*4/9 + 3*2/9) = 15/9 = 1.5555 microliters.

Now consider its encounter with the non-variable yellow flowers in which it always gets:

2 microliters!

It would naturally choose, the non-variable yellow flowers if it had a memory window of 2. Note also that over half of the bees (5/9) do not reject the flower because they get a good reward > 2 on average and only 4/9 of the bees get nothing. Thus, the bees reject the flower because they perceive it as having less reward given their 2 visit memory window.

We can show that in order for the bee to "know" that the flowers have the same reward, it has to have at least a memory window of 3. That might be just too much for the bee to remember.

We could also consider the same arguments for a bee to learn that variable flowers give more reward than non-variable flowers.

Let us assume that the reward on every other third flower is 12 microliters, but zero at the other 2 flowers. This is an average reward of 4 microliters, which is now twice the reward of the non-variable flower.
The non-variable flowers still have a reward of 2 microliters/every flower.

Assume the bee has a memory window that is 1 frame long.

The consider the odds that a bee will reject the blue flower from a single visit. Again, given that 2 flowers out of three are empty, 2/3 bees will reject on the first landing and opt for the non-variable flower. Even the 1 out of 3 bee that gets lucky on the first landing will ultimately come up empty on some landing in the future.

Thus, the bees will more than likely reject the variable flower in favor of the non-variable flower, despite the better reward!!

Let's say the bee has a memory window that is 2 frames long.

(12+0)/2 = 6 (with probability 1/3*2/3 = 2/9)

it could also visit two fulls in a row:

(12+12)/2 = 12 (with probability 1/3*1/3 = 1/9)

or two emptys in a row:

(0+0)/2 = 0 (with probability 2/3*2/3 = 4/9)

or an empty then a full:

(0+12)/2 = 6 (with probability 2/3*1/3 = 2/9)

The average bee would see the following average reward if it remembered two flowers:

6*2/9 + 12*1/9 + 0*4/9 + 6*2/9) = 15/9 = 4 microliters, but more importantly 5/9 of the bees would see this on their first landings and be likely to keep feeding.

Finally, lets consider more variability in the high value reward:

for the variable flowers, 1 in 4 flowers has 16 microliters=4ul/flower, and
the non-variable flower has 2 microliters.

Let's say the bee has a memory window that is 2 frames long.

(12+0)/2 = 8 (with probability 1/4*3/4 = 3/16)

it could also visit two fulls in a row:

(12+12)/2 = 16 (with probability 1/4*1/4 = 1/16)

or two emptys in a row:

(0+0)/2 = 0 (with probability 3/4*3/4 = 9/16)

or an empty then a full:

(0+12)/2 = 8 (with probability 3/4*1/4 = 3/16)

The average bee would see the following average reward if it remembered two flowers:

8*3/16 + 16*1/16 + 0*9/16 + 8*3/16) = 15/9 = 4 microliters.

Even though the bees have a whoopingly high reward roughly 9/16 of the bees would see nothing on their first landing and be likely to leave!!

This is because the reward has become too variable for a memory window of 2 and a bee would need to have a memory window of 3 to do the right thing. A memory window of 3 would reduce the get-nothing-on-first-landings class of bees to 3/4*3/4*3/4=27/64 or less than 50%).

Bottom Line: As the variability of the environment goes up, a forager would have to have a larger and larger memory window to forage optimally. Otherwise it would appear to be risk aversive.

Index

Back to the Syllabus