Abstract
Methodologies for failure assessment frequently rely on historical failure modes, causes, and recommendations for prevention. Meanwhile, there are growing databases of narrative-based lessons that are under-utilized due to their size. Advances in natural language processing (NLP) enable unsupervised extraction of this knowledge. We present a methodology for (1) identifying relevant information using a term frequency inverse document frequency (TF-IDF) classifier and (2) extracting knowledge for failure assessment using a hierarchical topic modeling approach, hierarchical latent Dirichlet allocation (LDA). To interpret the extracted topics, we apply an automatic topic labeling technique using pointwise mutual information (PMI) extraction. The methodology is applied to NASA’s Lessons Learned Information System (LLIS), which is publicly available. Partitioned topics enable the extraction of three aspects: cause, failure, and recommendation, while a hierarchy enables organization into a taxonomy. The methodology is generalizable to databases containing narrative-style documents, while the results from the LLIS represent a summary of themes in the dataset, expressed in a format that can be linked to early design failure analyses.