[R-SIG-Finance] Principal Component Analysis in Credit Risk

Patrick Caldon Patrick.Caldon at morningstar.com
Thu Sep 17 01:59:34 CEST 2015

Hi Amelia,

Such systems are doable.

* 250 companies might be a bit light? I'd start with more
* Analyst reports are ok, but consider other text-based data sources, you will get more milage there.
* PCA might not be the best starting point.  Either get a specialized text-based clustering methodology, e.g. topic model, or use a regression/classification technique robust to very high dimensional problems.  I would start with the latter given your brief description.

Good luck!


Dear Forum,

I need some direction and guidance. This perhaps may sound a vague question, but I will try to be specific as far as possible.

Recently I came to know about text analysis in R. Assuming I have analysts reports regarding say 250 companies. I am aware that out of these 25 companies, 5 companies have defaulted. I have been asked to apply principal component analysis to each of these 25 companies to find out those words which if are occurring in say the 26th companies Analyst report, it will give me clear indication that this company will default. 

I understand this is really a vague question. To begin with, can Principal Component Analysis be used for text and if yes, can someone give me some direction or source.



More information about the R-SIG-Finance mailing list