Volume 11, Number 5/6

Intelligent Portfolio Management via NLP Analysis of Financial 10-k Statements


Purva Singh, VIT University, India


The paper attempts to analyze if the sentiment stability of financial 10-K reports over time can determine the company’s future mean returns. A diverse portfolio of stocks was selected to test this hypothesis. The proposed framework downloads 10-K reports of the companies from SEC’s EDGAR database. It passes them through the preprocessing pipeline to extract critical sections of the filings to perform NLP analysis. Using Loughran and McDonald sentiment word list, the framework generates sentiment TF-IDF from the 10-K documents to calculate the cosine similarity between two consecutive 10-K reports and proposes to leverage this cosine similarity as the alpha factor. For analyzing the effectiveness of our alpha factor at predicting future returns, the framework uses the alphalens library to perform factor return analysis, turnover analysis, and for comparing the Sharpe ratio of potential alpha factors. The results show that there exists a strong correlation between the sentiment stability of our portfolio’s 10-K statements and its future mean returns. For the benefit of the research community, the code and Jupyter notebooks related to this paper have been open-sourced on Github1.


Natural language Processing, stock market, textual analysis, TF-IDF, cosine similarity, intelligent portfolio management.