VQ-DQN

Reference: O. Lockwood and M Si. Reinforcement learning with quantum variational circuit. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, pages 245–251. 2020. available here

Data: Graph (PDF), Raw CSV data

Description: Results reproduced using the published source code. The light blue lines indicate the total reward collected in an episode, using the greedy policy for each agent. The light red lines represent a moving average of the (up to) 20 previous episode returns. In the graph, results are averaged over five experiments, which is represented by the strong red and strong blue lines.

Reference: A. Skolik, S. Jerbi, V. Dunjko. Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning. 2021. arXiv preprint arXiv:2103.15084

Data: Graph (PDF), Raw CSV data

Description: Replication of the hybrid quantum-classical model (with data re-uploading) and the pure quantum model (with data re-uploading). In the graph, results are averaged over three experiments each.

References:

[1] O. Lockwood and M Si. Reinforcement learning with quantum variational circuit. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, pages 245–251. 2020. available here
[2] A. Skolik, S. Jerbi, V. Dunjko. Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning. 2021. arXiv preprint arXiv:2103.15084

Data: Graph (PDF), Raw CSV data

Description: The graph shows the validation returns for using the VQC-layer structure as in Ref. [1] (top) and Ref. [2] (bottom) with different input encoding strategies. The columns correspond to the extraction strategy (ltr. Global Scaling (GS), Global Scaling with Quantum Pooling (GSP), Local Scaling (LS)). Results are averaged over five experiments each.

References:

[1] O. Lockwood and M Si. Reinforcement learning with quantum variational circuit. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, pages 245–251. 2020. available here
[2] A. Skolik, S. Jerbi, V. Dunjko. Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning. 2021. arXiv preprint arXiv:2103.15084

Data:

Description:

The graphs show the validation returns for the hyperparameter constellations in the baseline configurations. The figures consider the results for the VQC architectures described in Ref. [1] and Ref. [2].

In the Graph "Best-performing hyperparameter constellations" columns correspond to the extraction method (ltr. Global Scaling (GS), Global Scaling with Quantum Pooling (GSP), Local Scaling (LS)), rows correspond to the input encoding strategy (Continuous (C), Scaled & Continuous (SC)).

In the Graphs "Full Cross Validation" columns correspond to eta_start (ltr. 0.001, 0.01, 0.1) and eta_duration (ltr. 2000, 4000), rows correspond to epsilon_duration (10000, 20000, 30000) and gamma (0.99, 0.999).

References:

[1] O. Lockwood and M Si. Reinforcement learning with quantum variational circuit. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, volume 16, pages 245–251. 2020. available here
[2] A. Skolik, S. Jerbi, V. Dunjko. Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning. 2021. arXiv preprint arXiv:2103.15084

Data:

Description:

The graphs show the validation returns for the hyperparameter constellations in the baseline configurations with data re-uploading. The figures consider the results for the VQC architectures described in Ref. [1] and Ref. [2].

In the Graph "Best-performing hyperparameter constellations" columns correspond to the extraction method (ltr. Global Scaling (GS), Global Scaling with Quantum Pooling (GSP), Local Scaling (LS)), rows correspond to the input encoding strategy (Continuous (C), Scaled & Continuous (SC)).

In the Graphs "Full Cross Validation" columns correspond to eta_start (ltr. 0.001, 0.01, 0.1) and eta_duration (ltr. 2000, 4000), rows correspond to epsilon_duration (10000, 20000, 30000) and gamma (0.99, 0.999).

Reference: IBM Quantum

Data: Graph (PDF), Raw CSV data

Description: The graph shows the results of our validation run on ibmq_ehningen.

Reference: [1] A. Skolik, S. Jerbi, V. Dunjko. Quantum agents in the Gym: a variational quantum algorithm for deep Q-learning. 2021. arXiv preprint arXiv:2103.15084

Data: Graph (PDF), Raw CSV data

Description:

The graph shows a comparison between the Variational-Quantum Deep Q-Network (VQ-DQN) and a classical neural network (NN), averaged over 30 different agents.

For the VQ-DQN, we used the configuration: [1]/GS/SC and the hyperparameter constellation: eta_start=0.01, eta_duration=4000, epsilon_duration=30000, gamma=0.99.

For the NN, we used 58 parameters and the hyperparameter constellation: eta_start=0.1, eta_duration=4000, epsilon_duration=20000, gamma=0.99.

Data: Graph (PDF), Raw CSV data

Description: The graph shows a set of results, that have enjoyed traversing the maximum number of episodes. For this we used different sets of parameters from our best-performing Cross-Validation runs. The results are averaged over 18 conducted experiments.

Figure 1 - Reproduced results from Lockwood and Si

Figure 2 - Replicated results from Skolik et al.

Figure 3 - Encoding and Extraction

Figure 4 - Cross Validation - Baseline

Figure 5 - Cross Validation - Baseline with data re-uploading

Figure 6 - Validation on IBM Quantum Device

Figure 7 - Comparison Neural Network - Variational-Quantum Deep Q-Network

Convergence behaviour without early stopping criterion