As strong advocates of fully reproducible research, we make all our derived results available. The complete code required to reproduce the results is available in on GitHub. Raw data, together with supplementary graphs, are available for download below.
Reference: O. Lockwood and M Si. Reinforcement learning with quantum variational circuit. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, pages 245–251. 2020. available here
Data: Graph (PDF), Raw CSV data
Description: Results reproduced using the published source code. The light blue lines indicate the total reward collected in an episode, using the greedy policy for each agent. The light red lines represent a moving average of the (up to) 20 previous episode returns. In the graph, results are averaged over five experiments, which is represented by the strong red and strong blue lines.
Reference: A. Skolik, S. Jerbi, V. Dunjko. Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning. 2021. arXiv preprint arXiv:2103.15084
Data: Graph (PDF), Raw CSV data
Description: Replication of the hybrid quantum-classical model (with data re-uploading) and the pure quantum model (with data re-uploading). In the graph, results are averaged over three experiments each.
References:
- [1] O. Lockwood and M Si. Reinforcement learning with quantum variational circuit. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, pages 245–251. 2020. available here
- [2] A. Skolik, S. Jerbi, V. Dunjko. Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning. 2021. arXiv preprint arXiv:2103.15084
Data: Graph (PDF), Raw CSV data
Description: The graph shows the validation returns for using the VQC-layer structure as in Ref. [1] (top) and Ref. [2] (bottom) with different input encoding strategies. The columns correspond to the extraction strategy (ltr. Global Scaling (GS), Global Scaling with Quantum Pooling (GSP), Local Scaling (LS)). Results are averaged over five experiments each.
References:
- [1] O. Lockwood and M Si. Reinforcement learning with quantum variational circuit. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, pages 245–251. 2020. available here
- [2] A. Skolik, S. Jerbi, V. Dunjko. Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning. 2021. arXiv preprint arXiv:2103.15084
Data:
- Graph (PDF) - Best-performing hyperparameter constellations
- Graph (PDF) - Full Cross-Validation, Configuration: Lockwood (VQC-Architecture) - GS (Extraction) - C (Encoding)
- Graph (PDF) - Full Cross-Validation, Configuration: Lockwood - GS - SC
- Graph (PDF) - Full Cross-Validation, Configuration: Lockwood - GSP - C
- Graph (PDF) - Full Cross-Validation, Configuration: Lockwood - GSP - SC
- Graph (PDF) - Full Cross-Validation, Configuration: Skolik et al. (VQC-Architecture) - GS - C
- Graph (PDF) - Full Cross-Validation, Configuration: Skolik et al. - GS - SC
- Graph (PDF) - Full Cross-Validation, Configuration: Skolik et al. - GSP - C
- Graph (PDF) - Full Cross-Validation, Configuration: Skolik et al. - GSP - SC
- Raw CSV data
Description:
The graphs show the validation returns for the hyperparameter constellations in the baseline configurations. The figures consider the results for the VQC architectures described in Ref. [1] and Ref. [2].
In the Graph "Best-performing hyperparameter constellations" columns correspond to the extraction method (ltr. Global Scaling (GS), Global Scaling with Quantum Pooling (GSP), Local Scaling (LS)), rows correspond to the input encoding strategy (Continuous (C), Scaled & Continuous (SC)).
In the Graphs "Full Cross Validation" columns correspond to eta_start
(ltr. 0.001, 0.01, 0.1) and eta_duration
(ltr. 2000, 4000), rows correspond to epsilon_duration
(10000, 20000, 30000) and gamma
(0.99, 0.999).
References:
- [1] O. Lockwood and M Si. Reinforcement learning with quantum variational circuit. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, pages 245–251. 2020. available here
- [2] A. Skolik, S. Jerbi, V. Dunjko. Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning. 2021. arXiv preprint arXiv:2103.15084
Data:
- Graph (PDF) - Best-performing hyperparameter constellations
- Graph (PDF) - Full Cross-Validation, Configuration: Lockwood (VQC-Architecture) - GS (Extraction) - C (Encoding)
- Graph (PDF) - Full Cross-Validation, Configuration: Lockwood - GS - SC
- Graph (PDF) - Full Cross-Validation, Configuration: Lockwood - GSP - C
- Graph (PDF) - Full Cross-Validation, Configuration: Lockwood - GSP - SC
- Graph (PDF) - Full Cross-Validation, Configuration: Skolik et al. (VQC-Architecture) - GS - C
- Graph (PDF) - Full Cross-Validation, Configuration: Skolik et al. - GS - SC
- Graph (PDF) - Full Cross-Validation, Configuration: Skolik et al. - GSP - C
- Graph (PDF) - Full Cross-Validation, Configuration: Skolik et al. - GSP - SC
- Raw CSV data
Description:
The graphs show the validation returns for the hyperparameter constellations in the baseline configurations with data re-uploading. The figures consider the results for the VQC architectures described in Ref. [1] and Ref. [2].
In the Graph "Best-performing hyperparameter constellations" columns correspond to the extraction method (ltr. Global Scaling (GS), Global Scaling with Quantum Pooling (GSP), Local Scaling (LS)), rows correspond to the input encoding strategy (Continuous (C), Scaled & Continuous (SC)).
In the Graphs "Full Cross Validation" columns correspond to eta_start
(ltr. 0.001, 0.01, 0.1) and eta_duration
(ltr. 2000, 4000), rows correspond to epsilon_duration
(10000, 20000, 30000) and gamma
(0.99, 0.999).
Reference: IBM Quantum
Data: Graph (PDF), Raw CSV data
Description: The graph shows the results of our validation run on ibmq_ehningen.
Reference: [1] A. Skolik, S. Jerbi, V. Dunjko. Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning. 2021. arXiv preprint arXiv:2103.15084
Data: Graph (PDF), Raw CSV data
Description:
The graph shows a comparison between the Variational-Quantum Deep Q-Network (VQ-DQN) and a classical neural network (NN), averaged over 30 different agents.
For the VQ-DQN, we used the configuration: [1]/GS/SC and the hyperparameter constellation: eta_start
=0.01, eta_duration
=4000, epsilon_duration
=30000, gamma
=0.99.
For the NN, we used 58 parameters and the hyperparameter constellation: eta_start
=0.1, eta_duration
=4000, epsilon_duration
=20000, gamma
=0.99.
Data: Graph (PDF), Raw CSV data
Description: The graph shows a set of results, that have enjoyed traversing the maximum number of episodes. For this we used different sets of parameters from our best-performing Cross-Validation runs. The results are averaged over 18 conducted experiments.