Two-Sided, Genetics-Based Learning To
Discover Novel Fighter Combat Maneuvers
R. E. Smith1 , B. A. Dike2 , B. Ravichandran3 , A. El-Fallah3 , and R. K. Mehra3
1
The Intelligent Computing Systems Centre, Bristol, UK, robert.smith@uwe.ac.uk
2
The Boeing Company, St. Louis, MO, USA, bruce.a.dike@boeing.com
3
Scientific Systems, Woburn, MA, USA, {ravi, adel, rkm}@ssci.com
Abstract. This paper reports the authors’ ongoing experience with a
system for discovering novel fighter combat maneuvers, using a geneticsbased machine learning process, and combat simulation. In effect, the
genetic learning system in this application is taking the place of a test
pilot, in discovering complex maneuvers from experience. The goal of
this work is distinct from that of many other studies, in that innovation,
and discovery of novelty (as opposed to optimality), is in itself valuable.
This makes the details of aims and techniques somewhat distinct from
other genetics-based machine learning research.
This paper presents previously unpublished results that show two coadapting players in similar aircraft. The complexities of analyzing these
results, given the red queen effect are discussed. Finally, general implications of this work are discussed.
1
Introduction
New technologies for fighter aircraft are being developed continuously. Often,
aircraft engineers can know a great deal about the aerodynamic performance
of new fighter aircraft that exploit new technologies, even before a physical
prototype is constructed or flown. Such aerodynamic knowledge is available from
design principles, from computer simulation, and from wind tunnel experiments.
Evaluating the impact of new technologies on actual combat can provide
vital feedback to designers, to customers, and to future pilots of the aircraft in
question. However, this feedback typically comes at a high price. While designers
can use fundamental design principles (i.e., tight turning capacity is good) to
shape their designs, often times good maneuvers lie in odd parts of the aircraft
performance space, and in the creativity and innovation of the pilot.
Therefore, the typical process would be to develop a new aircraft, construct a
one-off prototype, and allow test pilots to experiment with the prototype, developing maneuvers in simulated combat. Clearly, the expense of such a prototype
is substantial. Moreover, simulated combat with highly trained test pilots has a
substantial price tag. Therefore, it would be desirable to discover the maneuver
utility of new technologies, without a physical prototype.
The approach that is pursued in the authors’ past work [4, 6], and in new
work presented here, an adaptive, machine learning system takes the place of
the test pilot in simulated combat. This approach has several advantages.
As in a purely analytical approach, this approach requires a model. However,
in this case the model need only be accurate for purposes of combat simulation.
It need not present a mathematically tractable form.
Moreover, the approach is similar to that of man-in-the-loop simulation, except in this case the machine learning ”pilot” has no bias dictated by past
experiences with real aircraft, no prejudices against simulated combat, and no
tendency to tire of the simulated combat process after hundreds or thousands of
engagements. Also, one overcomes the constraints of real-time simulation.
This paper considers ongoing work in this area in terms of its unique character
as a machine learning and adaptive systems problem. To recognize the difference
between this problem and more typical machine learning problems, one must
consider its ultimate goal. This work is directed at filling the role of test pilot
in the generation of innovative, novel maneuvers. The work is not directed at
online control. That is to say, the machine learning system is not intended to
generate part of an algorithm for controlling a real fighter aircraft. Like the test
pilot in simulated combat, the machine learning system can periodically fail,
without worry that the associated combat failure will result in possible loss of
hardware and personnel. In many ways, the machine learning system is even less
constrained than the test pilot, in that it is more willing to experiment with
maneuvers that would be dangerous in fighter combat with real aircraft. The
goal of this work is the process of innovation and novelty, rather than discovering
optimality.
2
The LCS Used Here
Details of the LCS used here are briefly provided below. For a more detailed
discussion, see previous papers [4][5].
The LCS interacts in simulated, 1-versus-1 combat, through AASPEM, the
Air-to-Air System Performance Evaluation Model. AASPEM is a U.S. Government computer simulation of air-to-air combat, and is one of the standard models
for this topic. The classifier actions directly fire effectors (there are no internal
messages).
In our system if no classifiers are matched by the current message, a default
action for straight, level flight is used. There is no “cover” operator [7].
At the end of an engagement, the “measure of effectiveness” score for the
complete engagement is calculated. This score is assigned as the fitness for every classifier that acted during the engagement (and to any duplicates of these
classifiers). Note that this score replaces the score given by averaging the parent scores when the GA generated the rule. Thus, rules that do not fire simply
”inherit” the averaged fitness of their GA parents [6].
Our efforts have included an evaluation of different measures of effectiveness
within the genetics-based machine learning system, to determine the relative
sensitivity of the process. Initial candidate measures included exchange ratio,
time on advantage, time to first kill, and other relevant variables.
The measure of effectiveness ultimately selected to feedback into the GA
fitness function was based on the following steps. The base score was based on
a linear function of average angular advantage (opponent target aspect angle
minus ownship target aspect angle). To encourage maneuvers that might enable
gun firing opportunities, an additional score was added when the target was
within 5 degrees of the aircraft’s nose. A tax was applied to non-firing classifiers
to discourage the proliferation of parasite classifiers that contain elements of
high-performance classifiers but have insufficient material for activation. All nonfiring classifiers that were identical to a firing classifier were reassigned the firing
classifier’s fitness.
The GA acts at the end of each 30-second engagement. The GA is panmictic (it acts over the entire population). In some of our experiments, the entire
classifier list is replaced each time the GA is applied. This has been surprisingly successful, despite the expected disruption of the classifier list. In recent
experiments, we have used a generation gap of 0.5 (replacing half of the classifier
population with the GA). A new, GA-created classifier is assigned a fitness that
is the average of the fitness values of its “parent” classifiers. The GA used employed tournament selection, with a tournament size ranging from 2 to 8. Typical
GA parameters are a crossover probability of 0.95, and a mutation rate of 0.02
per bit position. When a condition bit is selected for mutation, it is set to one
of the three possible character values (1, 0, or # ), with equal probability. Note
that this actually yields an effective mutation probability of (0.02)(2/3)=0.0133.
Children rules replaced randomly selected rules in the population.
The matching rule with the highest fitness/strength is selected to act deterministically.
We have used a number of starting conditions for the 1-v.-1 combat simulations considered here. The primary source document for these conditions was the
X-31 Project Pinball II Tactical Utility Summary, which contained results from
manned simulation engagements conducted in 1993 at Ottobrunn, Germany [1].
The starting condition we will consider in this paper is the Slow-Speed Line
Abreast (SSLA, where the aircraft begin combat side-by-side, pointing in the
same direction.
3
Two-Sided Learning Results
In recent work with the fighter combat LCS, we have allowed both opponents to
adapt under the action of a GA [5]. This ongoing effort complicates the fighter
combat problem, and the interpretation of simulation results. Because of the red
queen effect [2], the dynamic system created by two players has several possible
attractors. These include fixed points, periodic behaviors, chaotic behaviors,
and arms races. The latter is clearly the behavior we want our simulations to
encourage. Our current results have (qualitatively) shown promise in this area
(i.e., we have seen an escalation of strategies between the two aircraft).
A number of approaches to two-sided learning have been considered. In each
approach, a “run” consists of 300 simulated combat engagements. Results in this
paper consider the following approach:
Alternate freeze learning with memory (MEM): This learning scheme
can be viewed as an extended version of the ALT learning. At the end of each run,
the results of the 300 engagements are scanned to obtain the highest measure
of effectiveness. The rules from the highest scoring engagement are used for the
frozen strategy in the next run. Furthermore, these rules are memorized and
are added to the population in the upcoming learning sequence runs. Thus, the
system has memory of its previously learned behavior.
3.1
Similar Aircraft (X-31 v. X-31)
This section presents results where two-players in similar aircraft (both X-31s)
co-adapt to one another.
Before examining the results graphically, it is useful to consider the progression of raw scores observed. These results are shown in Table 1. We will
distinguish the two X-31s by their initial configurations. Relative to their SSLA
starting conditions, we will call the player initially on the right player R and the
player initially on the left player L.
Learning Run
1
2
3
4
5
6
7
8
9
10
Best Score of Player R
49.53379
-38.88130
48.49355
1.854810
72.52103
-21.01414
87.11726
-7.360970
79.42159
30.43967
Table 1. Progression of Scores for one player (Player R) in a simulation with two X-31
aircraft co-adapting with LCSs.
Note the nature of this progression. Player R’s relative superiority alternates
as a result of the system’s learning. In other words, the player that is adapting
has a continual advantage. Note that the player’s interactions do not seem to
evolve towards a fix-point compromise, but seem to continue to adapt. This
leaves the possibility of period, chaotic, or (the desirable) arms race behavior.
We can gain some insight by examining plots of the ”best” (learning) player’s
dominance in each run. Note that these are typical results, and that each figure
is shown from a slightly different angle, for clarity.
Fig. 1. ”Best” maneuver in learning a) run 1 where Player R is learning and b) run 2
where Player L is learning.
Figure 1 a) shows the ”best” maneuver discovered in learning run 1, where
the player starting on the right (player R) has been learning under the action of
the GA, and player L has followed standard combat logic. This maneuver is best
in the sense of player R’s raw score. Player R has learned to dominate player
L, by staying inside player L’s turning radius, and employing a helicopter gun
maneuver. This is one of the post-stall tactic (PST) maneuvers often discovered
by the LCS in out one-sided learning experiments. Figure 1 b) shows the results
from the next learning run, where player R follows the strategy dictated by the
rules employed in maneuver shown in Figure 1 a). Note the shadow trace of
player L shown at the bottom of this figure. Player L has learned to respond to
player’s R’s helicopter gun maneuver with a J-turn (a turn utilizing Herbst-like
maneuvering) to escape. This is our first evidence of one player trumping the
advanced, PST maneuver learned by its opponent, by learning a PST maneuver
of its own.
Fig. 2. ”Best” maneuver in learning a) run 3 where Player R is learning and b) run 4
where Player L is learning.
Figure 2 a) shows the response when player R returns to the learning role
in run 3. Player R learns to abandon the helicopter gun maneuver given player
L’s J-turn escape. In this run, both players are exhibiting Herbst or J-turn
type maneuvers. Note that player L, while not learning, remains responsive to
changes in player R’s maneuver, due to activation of different rules at different
times in the run. At this stage, both players have reached similar strategy levels,
by exploiting so-called “out-of-plane” behavior (three-dimensional maneuvering,
with drastic movement out of the common plane the players occupy in space)
. Figure 2 b) shows player L learning to alter the end of its J-turn, such that
it turns to target player R near the end of the maneuver. Note that player R
has clearly remained responsive, despite not learning, and altered part of its
maneuver.
Fig. 3. ”Best” maneuver in learning a) run 5 where Player R is learning and b) run 6
where Player L is learning.
Figure 3 a) shows a much more advanced strategy emerging on the part of
player R, once again in the learning role. This maneuver combines features of a
Herbst maneuver (high angles of attack and rolling to rapidly change directions)
and features of a helicopter gun attack (thrust-vectored nose pointing inside the
opponent’s turn). Given this advanced maneuver, player L learns in run 6 to
extend its J-turn, and escape the fight (Figure 3 b)).
Fig. 4. ”Best” maneuver in learning a) run 7 where Player R is learning and b) run 8
where Player L is learning.
In run 7, player R refines its Herbst turn, putting the two players in parallel
PST turns, resulting in a steeply diving chase (Figure 4 a)). In run 8 (Figure 4
b)), player L learns to gain a few critical moments of advantage early in the
maneuver, through a brief helicopter gun attack, before it extending a dive out
of the fight. Note that, as before, player R remains reactive, despite its lack of
learning in this run. In reaction to player L’s early attack, it maintains altitude
to escape, rather than following the parallel diving pursuit shown in Figure 4 a).
Figure 5 a) shows the emergence of a maneuver where the players swing
and cross one another’s paths in the air, in a complex sort of ”rolling scissors”
maneuver [3]. Note the shadow traces in this plot, and compare the maneuver’s
complexity to that of the diving pursuit shown in Figure 4 a). In Figure 5 b),
player L once again learns to escape player R’s advanced strategy, through a full
Fig. 5. ”Best” maneuver in learning a) run 9 where Player R is learning and b) run 10
where Player L is learning.
inversion in a rolling turn. However, note that player R has remained reactive,
and, despite its lack of learning in this run, executes an effective helicopter gun
attack early in the run.
Throughout these runs, player R (which had the advantage of being ”first to
learn”) assumes a somewhat more aggressive posture. However, note that there
is a definite progression in the complexity of both players’ strategies, in reaction
to each other’s learning. This is the desired ”arms race” behavior that we are
attempting to encourage, such that the system discovers increasingly interesting
and novel maneuver sets.
4
Final Comments
Many conclusions and areas for future investigation can be drawn from the work
presented here. However, as a concluding focus of this paper, one should consider
the goal of the LCS approach in the fighter aircraft LCS, as a guideline for future
applications of the LCS, and other adaptive systems technologies. Since there is a
real, quantifiable value to the discovery of innovative, high utility fighter combat
maneuvers, one can concentrate on the exploration and synthesis aspects of the
LCS, without particular consider for the long term stability of any given rule set.
One should not overlook the utility of the LCS approach for generating novel,
innovated approaches to problems. In many domains (like the fighter aircraft
task), such open-ended machine innovation can have a real world, hard-cash
value. The applicability of the adaptive systems approach to such tasks deserves
further consideration.
Acknowledgements
The authors gratefully acknowledge that this work is sponsored by The United
States Air Force (Air Force F33657-97-C-2035 and Air Force F33657-98-C-2045).
The authors also gratefully acknowledge the support provided by NASA for the
early phases of this project, under grant NAS2-13994.
References
1. P. M. Doane, C. H. Gay, and J. A. Fligg. Multi-system integrated control (music)
program. Technical Report Final Report, Wright Laboratories, Wright-Patterson
AFB, OH, 1989.
2. D. Floriano and S. Nolfi. God save the red queen: Competition in co-evolutionary
robotics. In Proceedings of the Second International Conference on Genetic Programming, pages 398–406. MIT Press, 1997.
3. R. L. Shaw. Fighter Combat: Tactics and Maneuvering. United States Naval Institute Press, 1998.
4. R. E. Smith and B. A. Dike. Learning novel fighter combat maneuver rules via
genetic algorithms. International Journal of Expert Systems, 8(3):247–276, 1995.
5. R. E. Smith, B. A. Dike, R. K. Mehra, B. Ravichandran, and A. El-Fallah. Classifier
systems in combat: Two-sided learning of maneuvers for advanced fighter aircraft.
Computer Methods in Applied Mechanics and Engineering, 186:421–437, 2000.
6. R. E. Smith, B. A. Dike, and Stegmann. Inheritance in genetic algorithms. In
Proceedings of the ACM 1995 Symposium on Applied Computing, pages 345–350.
ACM Press, 1994.
7. S. W. Wilson. Zcs: A zeroth-level classifier system. Evolutionary Computation,
2(1):1–18, 1994.