Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Two-Sided, Genetics-Based Learning To Discover Novel Fighter Combat Maneuvers R. E. Smith1 , B. A. Dike2 , B. Ravichandran3 , A. El-Fallah3 , and R. K. Mehra3 1 The Intelligent Computing Systems Centre, Bristol, UK, robert.smith@uwe.ac.uk 2 The Boeing Company, St. Louis, MO, USA, bruce.a.dike@boeing.com 3 Scientific Systems, Woburn, MA, USA, {ravi, adel, rkm}@ssci.com Abstract. This paper reports the authors’ ongoing experience with a system for discovering novel fighter combat maneuvers, using a geneticsbased machine learning process, and combat simulation. In effect, the genetic learning system in this application is taking the place of a test pilot, in discovering complex maneuvers from experience. The goal of this work is distinct from that of many other studies, in that innovation, and discovery of novelty (as opposed to optimality), is in itself valuable. This makes the details of aims and techniques somewhat distinct from other genetics-based machine learning research. This paper presents previously unpublished results that show two coadapting players in similar aircraft. The complexities of analyzing these results, given the red queen effect are discussed. Finally, general implications of this work are discussed. 1 Introduction New technologies for fighter aircraft are being developed continuously. Often, aircraft engineers can know a great deal about the aerodynamic performance of new fighter aircraft that exploit new technologies, even before a physical prototype is constructed or flown. Such aerodynamic knowledge is available from design principles, from computer simulation, and from wind tunnel experiments. Evaluating the impact of new technologies on actual combat can provide vital feedback to designers, to customers, and to future pilots of the aircraft in question. However, this feedback typically comes at a high price. While designers can use fundamental design principles (i.e., tight turning capacity is good) to shape their designs, often times good maneuvers lie in odd parts of the aircraft performance space, and in the creativity and innovation of the pilot. Therefore, the typical process would be to develop a new aircraft, construct a one-off prototype, and allow test pilots to experiment with the prototype, developing maneuvers in simulated combat. Clearly, the expense of such a prototype is substantial. Moreover, simulated combat with highly trained test pilots has a substantial price tag. Therefore, it would be desirable to discover the maneuver utility of new technologies, without a physical prototype. The approach that is pursued in the authors’ past work [4, 6], and in new work presented here, an adaptive, machine learning system takes the place of the test pilot in simulated combat. This approach has several advantages. As in a purely analytical approach, this approach requires a model. However, in this case the model need only be accurate for purposes of combat simulation. It need not present a mathematically tractable form. Moreover, the approach is similar to that of man-in-the-loop simulation, except in this case the machine learning ”pilot” has no bias dictated by past experiences with real aircraft, no prejudices against simulated combat, and no tendency to tire of the simulated combat process after hundreds or thousands of engagements. Also, one overcomes the constraints of real-time simulation. This paper considers ongoing work in this area in terms of its unique character as a machine learning and adaptive systems problem. To recognize the difference between this problem and more typical machine learning problems, one must consider its ultimate goal. This work is directed at filling the role of test pilot in the generation of innovative, novel maneuvers. The work is not directed at online control. That is to say, the machine learning system is not intended to generate part of an algorithm for controlling a real fighter aircraft. Like the test pilot in simulated combat, the machine learning system can periodically fail, without worry that the associated combat failure will result in possible loss of hardware and personnel. In many ways, the machine learning system is even less constrained than the test pilot, in that it is more willing to experiment with maneuvers that would be dangerous in fighter combat with real aircraft. The goal of this work is the process of innovation and novelty, rather than discovering optimality. 2 The LCS Used Here Details of the LCS used here are briefly provided below. For a more detailed discussion, see previous papers [4][5]. The LCS interacts in simulated, 1-versus-1 combat, through AASPEM, the Air-to-Air System Performance Evaluation Model. AASPEM is a U.S. Government computer simulation of air-to-air combat, and is one of the standard models for this topic. The classifier actions directly fire effectors (there are no internal messages). In our system if no classifiers are matched by the current message, a default action for straight, level flight is used. There is no “cover” operator [7]. At the end of an engagement, the “measure of effectiveness” score for the complete engagement is calculated. This score is assigned as the fitness for every classifier that acted during the engagement (and to any duplicates of these classifiers). Note that this score replaces the score given by averaging the parent scores when the GA generated the rule. Thus, rules that do not fire simply ”inherit” the averaged fitness of their GA parents [6]. Our efforts have included an evaluation of different measures of effectiveness within the genetics-based machine learning system, to determine the relative sensitivity of the process. Initial candidate measures included exchange ratio, time on advantage, time to first kill, and other relevant variables. The measure of effectiveness ultimately selected to feedback into the GA fitness function was based on the following steps. The base score was based on a linear function of average angular advantage (opponent target aspect angle minus ownship target aspect angle). To encourage maneuvers that might enable gun firing opportunities, an additional score was added when the target was within 5 degrees of the aircraft’s nose. A tax was applied to non-firing classifiers to discourage the proliferation of parasite classifiers that contain elements of high-performance classifiers but have insufficient material for activation. All nonfiring classifiers that were identical to a firing classifier were reassigned the firing classifier’s fitness. The GA acts at the end of each 30-second engagement. The GA is panmictic (it acts over the entire population). In some of our experiments, the entire classifier list is replaced each time the GA is applied. This has been surprisingly successful, despite the expected disruption of the classifier list. In recent experiments, we have used a generation gap of 0.5 (replacing half of the classifier population with the GA). A new, GA-created classifier is assigned a fitness that is the average of the fitness values of its “parent” classifiers. The GA used employed tournament selection, with a tournament size ranging from 2 to 8. Typical GA parameters are a crossover probability of 0.95, and a mutation rate of 0.02 per bit position. When a condition bit is selected for mutation, it is set to one of the three possible character values (1, 0, or # ), with equal probability. Note that this actually yields an effective mutation probability of (0.02)(2/3)=0.0133. Children rules replaced randomly selected rules in the population. The matching rule with the highest fitness/strength is selected to act deterministically. We have used a number of starting conditions for the 1-v.-1 combat simulations considered here. The primary source document for these conditions was the X-31 Project Pinball II Tactical Utility Summary, which contained results from manned simulation engagements conducted in 1993 at Ottobrunn, Germany [1]. The starting condition we will consider in this paper is the Slow-Speed Line Abreast (SSLA, where the aircraft begin combat side-by-side, pointing in the same direction. 3 Two-Sided Learning Results In recent work with the fighter combat LCS, we have allowed both opponents to adapt under the action of a GA [5]. This ongoing effort complicates the fighter combat problem, and the interpretation of simulation results. Because of the red queen effect [2], the dynamic system created by two players has several possible attractors. These include fixed points, periodic behaviors, chaotic behaviors, and arms races. The latter is clearly the behavior we want our simulations to encourage. Our current results have (qualitatively) shown promise in this area (i.e., we have seen an escalation of strategies between the two aircraft). A number of approaches to two-sided learning have been considered. In each approach, a “run” consists of 300 simulated combat engagements. Results in this paper consider the following approach: Alternate freeze learning with memory (MEM): This learning scheme can be viewed as an extended version of the ALT learning. At the end of each run, the results of the 300 engagements are scanned to obtain the highest measure of effectiveness. The rules from the highest scoring engagement are used for the frozen strategy in the next run. Furthermore, these rules are memorized and are added to the population in the upcoming learning sequence runs. Thus, the system has memory of its previously learned behavior. 3.1 Similar Aircraft (X-31 v. X-31) This section presents results where two-players in similar aircraft (both X-31s) co-adapt to one another. Before examining the results graphically, it is useful to consider the progression of raw scores observed. These results are shown in Table 1. We will distinguish the two X-31s by their initial configurations. Relative to their SSLA starting conditions, we will call the player initially on the right player R and the player initially on the left player L. Learning Run 1 2 3 4 5 6 7 8 9 10 Best Score of Player R 49.53379 -38.88130 48.49355 1.854810 72.52103 -21.01414 87.11726 -7.360970 79.42159 30.43967 Table 1. Progression of Scores for one player (Player R) in a simulation with two X-31 aircraft co-adapting with LCSs. Note the nature of this progression. Player R’s relative superiority alternates as a result of the system’s learning. In other words, the player that is adapting has a continual advantage. Note that the player’s interactions do not seem to evolve towards a fix-point compromise, but seem to continue to adapt. This leaves the possibility of period, chaotic, or (the desirable) arms race behavior. We can gain some insight by examining plots of the ”best” (learning) player’s dominance in each run. Note that these are typical results, and that each figure is shown from a slightly different angle, for clarity. Fig. 1. ”Best” maneuver in learning a) run 1 where Player R is learning and b) run 2 where Player L is learning. Figure 1 a) shows the ”best” maneuver discovered in learning run 1, where the player starting on the right (player R) has been learning under the action of the GA, and player L has followed standard combat logic. This maneuver is best in the sense of player R’s raw score. Player R has learned to dominate player L, by staying inside player L’s turning radius, and employing a helicopter gun maneuver. This is one of the post-stall tactic (PST) maneuvers often discovered by the LCS in out one-sided learning experiments. Figure 1 b) shows the results from the next learning run, where player R follows the strategy dictated by the rules employed in maneuver shown in Figure 1 a). Note the shadow trace of player L shown at the bottom of this figure. Player L has learned to respond to player’s R’s helicopter gun maneuver with a J-turn (a turn utilizing Herbst-like maneuvering) to escape. This is our first evidence of one player trumping the advanced, PST maneuver learned by its opponent, by learning a PST maneuver of its own. Fig. 2. ”Best” maneuver in learning a) run 3 where Player R is learning and b) run 4 where Player L is learning. Figure 2 a) shows the response when player R returns to the learning role in run 3. Player R learns to abandon the helicopter gun maneuver given player L’s J-turn escape. In this run, both players are exhibiting Herbst or J-turn type maneuvers. Note that player L, while not learning, remains responsive to changes in player R’s maneuver, due to activation of different rules at different times in the run. At this stage, both players have reached similar strategy levels, by exploiting so-called “out-of-plane” behavior (three-dimensional maneuvering, with drastic movement out of the common plane the players occupy in space) . Figure 2 b) shows player L learning to alter the end of its J-turn, such that it turns to target player R near the end of the maneuver. Note that player R has clearly remained responsive, despite not learning, and altered part of its maneuver. Fig. 3. ”Best” maneuver in learning a) run 5 where Player R is learning and b) run 6 where Player L is learning. Figure 3 a) shows a much more advanced strategy emerging on the part of player R, once again in the learning role. This maneuver combines features of a Herbst maneuver (high angles of attack and rolling to rapidly change directions) and features of a helicopter gun attack (thrust-vectored nose pointing inside the opponent’s turn). Given this advanced maneuver, player L learns in run 6 to extend its J-turn, and escape the fight (Figure 3 b)). Fig. 4. ”Best” maneuver in learning a) run 7 where Player R is learning and b) run 8 where Player L is learning. In run 7, player R refines its Herbst turn, putting the two players in parallel PST turns, resulting in a steeply diving chase (Figure 4 a)). In run 8 (Figure 4 b)), player L learns to gain a few critical moments of advantage early in the maneuver, through a brief helicopter gun attack, before it extending a dive out of the fight. Note that, as before, player R remains reactive, despite its lack of learning in this run. In reaction to player L’s early attack, it maintains altitude to escape, rather than following the parallel diving pursuit shown in Figure 4 a). Figure 5 a) shows the emergence of a maneuver where the players swing and cross one another’s paths in the air, in a complex sort of ”rolling scissors” maneuver [3]. Note the shadow traces in this plot, and compare the maneuver’s complexity to that of the diving pursuit shown in Figure 4 a). In Figure 5 b), player L once again learns to escape player R’s advanced strategy, through a full Fig. 5. ”Best” maneuver in learning a) run 9 where Player R is learning and b) run 10 where Player L is learning. inversion in a rolling turn. However, note that player R has remained reactive, and, despite its lack of learning in this run, executes an effective helicopter gun attack early in the run. Throughout these runs, player R (which had the advantage of being ”first to learn”) assumes a somewhat more aggressive posture. However, note that there is a definite progression in the complexity of both players’ strategies, in reaction to each other’s learning. This is the desired ”arms race” behavior that we are attempting to encourage, such that the system discovers increasingly interesting and novel maneuver sets. 4 Final Comments Many conclusions and areas for future investigation can be drawn from the work presented here. However, as a concluding focus of this paper, one should consider the goal of the LCS approach in the fighter aircraft LCS, as a guideline for future applications of the LCS, and other adaptive systems technologies. Since there is a real, quantifiable value to the discovery of innovative, high utility fighter combat maneuvers, one can concentrate on the exploration and synthesis aspects of the LCS, without particular consider for the long term stability of any given rule set. One should not overlook the utility of the LCS approach for generating novel, innovated approaches to problems. In many domains (like the fighter aircraft task), such open-ended machine innovation can have a real world, hard-cash value. The applicability of the adaptive systems approach to such tasks deserves further consideration. Acknowledgements The authors gratefully acknowledge that this work is sponsored by The United States Air Force (Air Force F33657-97-C-2035 and Air Force F33657-98-C-2045). The authors also gratefully acknowledge the support provided by NASA for the early phases of this project, under grant NAS2-13994. References 1. P. M. Doane, C. H. Gay, and J. A. Fligg. Multi-system integrated control (music) program. Technical Report Final Report, Wright Laboratories, Wright-Patterson AFB, OH, 1989. 2. D. Floriano and S. Nolfi. God save the red queen: Competition in co-evolutionary robotics. In Proceedings of the Second International Conference on Genetic Programming, pages 398–406. MIT Press, 1997. 3. R. L. Shaw. Fighter Combat: Tactics and Maneuvering. United States Naval Institute Press, 1998. 4. R. E. Smith and B. A. Dike. Learning novel fighter combat maneuver rules via genetic algorithms. International Journal of Expert Systems, 8(3):247–276, 1995. 5. R. E. Smith, B. A. Dike, R. K. Mehra, B. Ravichandran, and A. El-Fallah. Classifier systems in combat: Two-sided learning of maneuvers for advanced fighter aircraft. Computer Methods in Applied Mechanics and Engineering, 186:421–437, 2000. 6. R. E. Smith, B. A. Dike, and Stegmann. Inheritance in genetic algorithms. In Proceedings of the ACM 1995 Symposium on Applied Computing, pages 345–350. ACM Press, 1994. 7. S. W. Wilson. Zcs: A zeroth-level classifier system. Evolutionary Computation, 2(1):1–18, 1994.