Publications
2025
- ICMLAlphaQCM: Alpha Discovery in Finance with Distributional Reinforcement LearningZhoufan Zhu, and Ke ZhuIn Proceedings of the 42th International Conference on Machine Learning, 2025
For researchers and practitioners in finance, finding synergistic formulaic alphas is very important but challenging. In this paper, we reconsider the discovery of synergistic formulaic alphas from the viewpoint of sequential decision-making, and conceptualize the entire alpha discovery process as a non-stationary and reward-sparse Markov decision process. To overcome the challenges of non-stationarity and reward-sparsity, we propose the AlphaQCM method, a novel distributional reinforcement learning method designed to search for synergistic formulaic alphas efficiently. The AlphaQCM method first learns the Q function and quantiles via a Q network and a quantile network, respectively. Then, the AlphaQCM method applies the quantiled conditional moment method to learn unbiased variance from the potentially biased quantiles. Guided by the learned Q function and variance, the AlphaQCM method navigates the non-stationarity and reward-sparsity to explore the vast search space of formulaic alphas with high efficacy. Empirical applications to real-world datasets demonstrate that our AlphaQCM method significantly outperforms its competitors, particularly when dealing with large datasets comprising numerous stocks.
@inproceedings{zhu2025alphaqcm, title = {AlphaQCM: Alpha Discovery in Finance with Distributional Reinforcement Learning}, author = {Zhu, Zhoufan and Zhu, Ke}, booktitle = {Proceedings of the 42th International Conference on Machine Learning}, year = {2025}, }
- PreprintTensor dynamic conditional correlation model: A new way to pursuit "Holy Grail of investing"Cheng Yu, Zhoufan Zhu, and Ke Zhu2025
Style investing creates asset classes (or the so-called "styles") with low correlations, aligning well with the principle of "Holy Grail of investing" in terms of portfolio selection. The returns of styles naturally form a tensor-valued time series, which requires new tools for studying the dynamics of the conditional correlation matrix to facilitate the aforementioned principle. Towards this goal, we introduce a new tensor dynamic conditional correlation (TDCC) model, which is based on two novel treatments: trace-normalization and dimension-normalization. These two normalizations adapt to the tensor nature of the data, and they are necessary except when the tensor data reduce to vector data. Moreover, we provide an easy-to-implement estimation procedure for the TDCC model, and examine its finite sample performance by simulations. Finally, we assess the usefulness of the TDCC model in international portfolio selection across ten global markets and in large portfolio selection for 1800 stocks from the Chinese stock market.
@misc{yu2025tensor, title = {Tensor dynamic conditional correlation model: A new way to pursuit "Holy Grail of investing"}, author = {Yu, Cheng and Zhu, Zhoufan and Zhu, Ke}, year = {2025}, eprint = {2502.13461}, archiveprefix = {arXiv}, primaryclass = {q-fin.PM}, }
2024
- PreprintEnhancement of Price Trend Trading Strategies via Image-induced Importance WeightsZhoufan Zhu, and Ke Zhu2024
We open up the "black-box" to identify the predictive general price patterns in price chart images via the deep learning image analysis techniques. Our identified price patterns lead to the construction of image-induced importance (triple-I) weights, which are applied to weighted moving average the existing price trend trading signals according to their level of importance in predicting price movements. From an extensive empirical analysis on the Chinese stock market, we show that the triple-I weighting scheme can significantly enhance the price trend trading signals for proposing portfolios, with a thoughtful robustness study in terms of network specifications, image structures, and stock sizes. Moreover, we demonstrate that the triple-I weighting scheme is able to propose long-term portfolios from a time-scale transfer learning, enhance the news-based trading strategies through a non-technical transfer learning, and increase the overall strength of numerous trading rules for portfolio selection.
@misc{Zhu2024Enhancement, title = {Enhancement of Price Trend Trading Strategies via Image-induced Importance Weights}, author = {Zhu, Zhoufan and Zhu, Ke}, year = {2024}, eprint = {2408.08483}, archiveprefix = {arXiv}, primaryclass = {q-fin.PM}, }
- JEFBig Portfolio Selection by Graph-based Conditional Moments Method.Zhoufan Zhu, Ningning Zhang, and Ke ZhuJournal of Empirical Finance, 2024
This paper proposes a new graph-based conditional moments (GRACE) method to do portfolio selection based on thousands of stocks or even more. The GRACE method first learns the conditional quantiles and mean of stock returns via a factor-augmented temporal graph convolutional network, which is guided by the set of stock-to-stock relations as well as the set of factor-to-stock relations. Next, the GRACE method learns the conditional variance, skewness, and kurtosis of stock returns from the learned conditional quantiles via the quantiled conditional moment method. Finally, the GRACE method uses the learned conditional mean, variance, skewness, and kurtosis to construct several performance measures, which are criteria to sort the stocks to proceed the portfolio selection in the well-known 10-decile framework. An application to NASDAQ and NYSE stock markets shows that the GRACE method performs much better than its competitors, particularly when the performance measures are comprised of conditional variance, skewness, and kurtosis.
@article{Zhu2024Graph, title = {Big Portfolio Selection by Graph-based Conditional Moments Method.}, author = {Zhu, Zhoufan and Zhang, Ningning and Zhu, Ke}, journal = {Journal of Empirical Finance}, volume = {78}, number = {}, pages = {101533}, year = {2024}, }
- JBESAsset Pricing via the Conditional Quantile Variational Autoencoder.Xunling Yang†, Zhoufan Zhu†, Dong Li, and Ke ZhuJournal of Business & Economic Statistics, 2024
We propose a new asset pricing model that is applicable to the big panel of return data. The main idea of this model is to learn the conditional distribution of the return, which is approximated by a step distribution function constructed from conditional quantiles of the return. To study conditional quantiles of the return, we propose a new conditional quantile variational autoencoder (CQVAE) network. The CQVAE network specifies a factor structure for conditional quantiles with latent factors learned from a VAE network and nonlinear factor loadings learned from a “multi-head” network. Under the CQVAE network, we allow the observed covariates such as asset characteristics to guide the structure of latent factors and factor loadings. Furthermore, we provide a two-step estimation procedure for the CQVAE network. Using the learned conditional distribution of return from the CQVAE network, we propose our asset pricing model from the mean of this distribution, and additionally, we use both the mean and variance of this distribution to select portfolios. Finally, we apply our CQVAE asset pricing model to analyze a large 60-year US equity return dataset. Compared with the benchmark conditional autoencoder model, the CQVAE model not only delivers much larger values of out-of-sample total and predictive R-sqaures, but also earns at least 30.9% higher values of Sharpe ratios for both long-short and long-only portfolios.
@article{Yang2024Asset, title = {Asset Pricing via the Conditional Quantile Variational Autoencoder.}, author = {Yang, Xunling and Zhu, Zhoufan and Li, Dong and Zhu, Ke}, journal = {Journal of Business \& Economic Statistics}, volume = {42}, number = {2}, pages = {681-694}, year = {2024}, publisher = {Taylor & Francis}, }
- TNNLSMonotonic Quantile Network for Worst-Case Offline Reinforcement LearningChenjia Bai, Ting Xiao, Zhoufan Zhu, Lingxiao Wang, and 5 more authorsIEEE Transactions on Neural Networks and Learning Systems, 2024
A key challenge in offline reinforcement learning (RL) is how to ensure the learned offline policy is safe, especially in safety-critical domains. In this article, we focus on learning a distributional value function in offline RL and optimizing a worst-case criterion of returns. However, optimizing a distributional value function in offline RL can be hard, since the crossing quantile issue is serious, and the distribution shift problem needs to be addressed. To this end, we propose monotonic quantile network (MQN) with conservative quantile regression (CQR) for risk-averse policy learning. First, we propose an MQN to learn the distribution over returns with non-crossing guarantees of the quantiles. Then, we perform CQR by penalizing the quantile estimation for out-of-distribution (OOD) actions to address the distribution shift in offline RL. Finally, we learn a worst-case policy by optimizing the conditional value-at-risk (CVaR) of the distributional value function. Furthermore, we provide theoretical analysis of the fixed-point convergence in our method. We conduct experiments in both risk-neutral and risk-sensitive offline settings, and the results show that our method obtains safe and conservative behaviors in robotic locomotion tasks.
@article{Bai2024Monotonic, author = {Bai, Chenjia and Xiao, Ting and Zhu, Zhoufan and Wang, Lingxiao and Zhou, Fan and Garg, Animesh and He, Bin and Liu, Peng and Wang, Zhaoran}, journal = {IEEE Transactions on Neural Networks and Learning Systems}, title = {Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning}, year = {2024}, volume = {35}, number = {7}, pages = {8954-8968}, }
2023
- ICMLVariance Control for Distributional Reinforcement LearningQi Kuang†, Zhoufan Zhu†, Liwen Zhang, and Fan ZhouIn Proceedings of the 40th International Conference on Machine Learning, 2023
Although distributional reinforcement learning (DRL) has been widely examined in the past few years, very few studies investigate the validity of the obtained Q-function estimator in the distributional setting. To fully understand how the approximation errors of the Q-function affect the whole training process, we do some error analysis and theoretically show how to reduce both the bias and the variance of the error terms. With this new understanding, we construct a new estimator Quantiled Expansion Mean (QEM) and introduce a new DRL algorithm (QEMRL) from the statistical perspective. We extensively evaluate our QEMRL algorithm on a variety of Atari and Mujoco benchmark tasks and demonstrate that QEMRL achieves significant improvement over baseline algorithms in terms of sample efficiency and convergence performance.
@inproceedings{Kuang2023Variance, author = {Kuang, Qi and Zhu, Zhoufan and Zhang, Liwen and Zhou, Fan}, title = {Variance Control for Distributional Reinforcement Learning}, year = {2023}, publisher = {JMLR.org}, articleno = {736}, numpages = {22}, booktitle = {Proceedings of the 40th International Conference on Machine Learning}, }
2022
- CJSShrinkage Quantile Regression for Panel Data with Multiple Structural BreaksLiwen Zhang, Zhoufan Zhu, Xingdong Feng, and Yong HeCanadian Journal of Statistics, 2022
We consider a shrinkage quantile regression model for high-dimensional panel data with multiple structural breaks. The structural breaks are assumed to be common across all individuals, but may vary across different quantile levels while sharing an identical location shift effect. We impose an L1 penalty on the individual effects and an L1 -type fusion penalty to estimate both the slope coefficients and the structural breaks by combining information at multiple quantile levels. The proposed method can detect “partial” changes of the regression coefficients and consistently estimate both the number and dates of the breaks with probability tending to 1. We establish the asymptotic properties of the proposed regression coefficient estimators as well as their post-selection counterparts, where the dimensionality of the covariates is allowed to diverge. Simulation results demonstrate that the proposed method works well in finite-sample cases. Using the proposed method, we obtain many interesting results by analyzing a dataset concerning environmental Kuznets curves.
@article{Zhang2022Shrinkage, author = {Zhang, Liwen and Zhu, Zhoufan and Feng, Xingdong and He, Yong}, title = {Shrinkage Quantile Regression for Panel Data with Multiple Structural Breaks}, journal = {Canadian Journal of Statistics}, volume = {50}, number = {3}, pages = {820-851}, year = {2022}, }
2021
- IJCAINon-decreasing Quantile Function Network with Efficient Exploration for Distributional Reinforcement LearningFan Zhou, Zhoufan Zhu, Qi Kuang, and Liwen ZhangIn Proceedings of the 30th International Joint Conference on Artificial Intelligence, 2021
Although distributional reinforcement learning (DRL) has been widely examined in the past few years, there are two open questions people are still trying to address. One is how to ensure the validity of the learned quantile function, the other is how to efficiently utilize the distribution information. This paper attempts to provide some new perspectives to encourage the future in-depth studies in these two fields. We first propose a non-decreasing quantile function network (NDQFN) to guarantee the monotonicity of the obtained quantile estimates and then design a general exploration framework called distributional prediction error (DPE) for DRL which utilizes the entire distribution of the quantile function. In this paper, we not only discuss the theoretical necessity of our method but also show the performance gain it achieves in practice by comparing with some competitors on Atari 2600 Games especially in some hard-explored games.
@inproceedings{Zhou2021Non, author = {Zhou, Fan and Zhu, Zhoufan and Kuang, Qi and Zhang, Liwen}, booktitle = {Proceedings of the 30th International Joint Conference on Artificial Intelligence}, pages = {3455-3461}, title = {Non-decreasing Quantile Function Network with Efficient Exploration for Distributional Reinforcement Learning}, year = {2021}, }