## Abstract

We analyze a fixed-point algorithm for reinforcement learning (RL) of optimal portfolio mean-variance preferences in the setting of multivariate generalized autoregressive conditional-heteroskedasticity (MGARCH) with a small penalty on trading. A numerical solution is obtained using a neural network (NN) architecture within a recursive RL loop. A fixed-point theorem proves that NN approximation error has a big-oh bound that we can reduce by increasing the number of NN parameters. The functional form of the trading penalty has a parameter ϵ >0 that controls the magnitude of transaction costs. When ϵ is small, we can implement an NN algorithm based on the expansion of the solution in powers of ϵ. This expansion has a base term equal to a myopic solution with an explicit form, and a first-order correction term that we compute in the RL loop. Our expansion-based algorithm is stable, allows for fast computation, and outputs a solution that shows positive testing performance.

Original language | English (US) |
---|---|

Pages (from-to) | 16774-16792 |

Number of pages | 19 |

Journal | IEEE Access |

Volume | 11 |

DOIs | |

State | Published - 2023 |

## Keywords

- Hetereoskedasticity
- MGARCH
- deep neural networks
- fixed-point algorithms
- reinforcement learning

## ASJC Scopus subject areas

- General Computer Science
- General Materials Science
- General Engineering
- Electrical and Electronic Engineering