Rewarding Air Combat Behavior in Training Simulations

Armon Toubman, Jan Joris Roessingh, Pieter Spronck, Aske Plaat, H.J. van den Herik

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    Abstract

    Computer generated forces (CGFs) inhabiting air combat training simulations must show realistic and adaptive behavior to effectively perform their roles as allies and adversaries. In earlier work, behavior for these CGFs was successfully generated using reinforcement learning. However, due to missile hits being subject to chance (a.k.a. the probability-of-kill), the CGFs have in certain cases been improperly rewarded and punished. We surmise that taking this probability-of-kill into account in the reward function will improve performance. To remedy the false rewards and punishments, a new reward function is proposed that rewards agents based on the expected outcome of their actions. Tests show that the use of this function significantly increases the performance of the CGFs in various scenarios, compared to the previous reward function and a naïve baseline. Based on the results, the new reward function allows the CGFs to generate more intelligent behavior, which enables better training simulations.
    Original languageEnglish
    Title of host publicationProceedings of the IEEE International Conference on Systems, Man, and Cybernetics
    PublisherIEEE Press
    Publication statusPublished - 2015
    EventIEEE International Conference on Systems, Man and Cybernetics 2015 - Hong Kong, China
    Duration: 9 Oct 201512 Oct 2015

    Conference

    ConferenceIEEE International Conference on Systems, Man and Cybernetics 2015
    CountryChina
    CityHong Kong
    Period9/10/1512/10/15

    Fingerprint

    Air
    Reinforcement learning
    Missiles

    Cite this

    Toubman, A., Roessingh, J. J., Spronck, P., Plaat, A., & van den Herik, H. J. (2015). Rewarding Air Combat Behavior in Training Simulations. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics IEEE Press.
    Toubman, Armon ; Roessingh, Jan Joris ; Spronck, Pieter ; Plaat, Aske ; van den Herik, H.J. / Rewarding Air Combat Behavior in Training Simulations. Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE Press, 2015.
    @inproceedings{1f0ac74238944bd2bea3a79020d6c715,
    title = "Rewarding Air Combat Behavior in Training Simulations",
    abstract = "Computer generated forces (CGFs) inhabiting air combat training simulations must show realistic and adaptive behavior to effectively perform their roles as allies and adversaries. In earlier work, behavior for these CGFs was successfully generated using reinforcement learning. However, due to missile hits being subject to chance (a.k.a. the probability-of-kill), the CGFs have in certain cases been improperly rewarded and punished. We surmise that taking this probability-of-kill into account in the reward function will improve performance. To remedy the false rewards and punishments, a new reward function is proposed that rewards agents based on the expected outcome of their actions. Tests show that the use of this function significantly increases the performance of the CGFs in various scenarios, compared to the previous reward function and a na{\"i}ve baseline. Based on the results, the new reward function allows the CGFs to generate more intelligent behavior, which enables better training simulations.",
    author = "Armon Toubman and Roessingh, {Jan Joris} and Pieter Spronck and Aske Plaat and {van den Herik}, H.J.",
    year = "2015",
    language = "English",
    booktitle = "Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics",
    publisher = "IEEE Press",

    }

    Toubman, A, Roessingh, JJ, Spronck, P, Plaat, A & van den Herik, HJ 2015, Rewarding Air Combat Behavior in Training Simulations. in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE Press, IEEE International Conference on Systems, Man and Cybernetics 2015, Hong Kong, China, 9/10/15.

    Rewarding Air Combat Behavior in Training Simulations. / Toubman, Armon; Roessingh, Jan Joris; Spronck, Pieter; Plaat, Aske; van den Herik, H.J.

    Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE Press, 2015.

    Research output: Chapter in Book/Report/Conference proceedingConference contributionScientificpeer-review

    TY - GEN

    T1 - Rewarding Air Combat Behavior in Training Simulations

    AU - Toubman, Armon

    AU - Roessingh, Jan Joris

    AU - Spronck, Pieter

    AU - Plaat, Aske

    AU - van den Herik, H.J.

    PY - 2015

    Y1 - 2015

    N2 - Computer generated forces (CGFs) inhabiting air combat training simulations must show realistic and adaptive behavior to effectively perform their roles as allies and adversaries. In earlier work, behavior for these CGFs was successfully generated using reinforcement learning. However, due to missile hits being subject to chance (a.k.a. the probability-of-kill), the CGFs have in certain cases been improperly rewarded and punished. We surmise that taking this probability-of-kill into account in the reward function will improve performance. To remedy the false rewards and punishments, a new reward function is proposed that rewards agents based on the expected outcome of their actions. Tests show that the use of this function significantly increases the performance of the CGFs in various scenarios, compared to the previous reward function and a naïve baseline. Based on the results, the new reward function allows the CGFs to generate more intelligent behavior, which enables better training simulations.

    AB - Computer generated forces (CGFs) inhabiting air combat training simulations must show realistic and adaptive behavior to effectively perform their roles as allies and adversaries. In earlier work, behavior for these CGFs was successfully generated using reinforcement learning. However, due to missile hits being subject to chance (a.k.a. the probability-of-kill), the CGFs have in certain cases been improperly rewarded and punished. We surmise that taking this probability-of-kill into account in the reward function will improve performance. To remedy the false rewards and punishments, a new reward function is proposed that rewards agents based on the expected outcome of their actions. Tests show that the use of this function significantly increases the performance of the CGFs in various scenarios, compared to the previous reward function and a naïve baseline. Based on the results, the new reward function allows the CGFs to generate more intelligent behavior, which enables better training simulations.

    M3 - Conference contribution

    BT - Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics

    PB - IEEE Press

    ER -

    Toubman A, Roessingh JJ, Spronck P, Plaat A, van den Herik HJ. Rewarding Air Combat Behavior in Training Simulations. In Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics. IEEE Press. 2015