Abstract

The natural language descriptions of the capabilities of manufacturing companies can be found in multiple locations including company websites, legacy system databases, and ad hoc documents and spreadsheets. To unlock the value of unstructured capability data and learn from it, there is a need for developing advanced quantitative methods supported by machine learning and natural language processing techniques. This research proposes a hybrid unsupervised learning methodology using K-means clustering and topic modeling techniques in order to build clusters of suppliers based on their capabilities, automatically infer topics from the created clusters, and discover nontrivial patterns in manufacturing capability corpora. The capability data is extracted either directly from the website of manufacturing firms or from their profiles in e-sourcing portals and directories. Feature extraction and dimensionality reduction process in this work are supported by N-gram extraction and latent semantic analysis (LSA) methods. The proposed clustering method is validated experimentally based on a dataset composed of 150 capability descriptions collected from web-based sourcing directories such as the Thomas Net directory for manufacturing companies. The results of the experiment show that the proposed method creates supplier cluster with high accuracy. Two example applications of the proposed framework, related to supplier similarity measurement and automated thesaurus creation, are introduced in this paper.

References

1.
Sabbagh
,
R.
, and
Ameri
,
F.
,
2017
, “
A Thesaurus-Guided Text Analytics Technique for Capability-Based Classification of Manufacturing Suppliers
,”
ASME 2017 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Cleveland, OH
,
Aug. 6–9
, p.
V001T02A075
.
2.
Sabbagh
,
R.
,
Ameri
,
F.
, and
Yoder
,
R.
,
2018
, “
Thesaurus-Guided Text Analytics Technique for Capability-Based Classification of Manufacturing Suppliers
,”
ASME J. Comput. Inf. Sci. Eng.
,
18
(
3
), p.
031009
. 10.1115/1.4039553
3.
Sabbagh
,
R.
,
2018
,
Semantic Text Analytics Technique for Classification of Manufacturing Suppliers
,
Texas State University
,
San Marcos, TX
.
4.
Hastie
,
T.
,
Tibshirani
,
R.
, and
Friedman
,
J.
,
2009
, “Unsupervised Learning,”
The Elements of Statistical Learning
,
Springer
,
New York
, pp.
485
585
.
5.
Jain
,
A. K.
,
Murty
,
M. N.
, and
Flynn
,
P. J.
,
1999
, “
Data Clustering: A Review
,”
ACM Comput. Surv.
,
31
(
3
), pp.
264
323
. 10.1145/331499.331504
6.
Kaplan
,
R. M.
,
2005
,
A Method for Tokenizing Text
,
Stanford, CA
, p.
55
.
7.
Wang
,
X.
,
McCallum
,
A.
, and
Wei
,
X.
,
2007
, “
Topical N-Grams: Phrase and Topic Discovery, With an Application to Information Retrieval
,”
ICDM
,
Omaha, NE
, pp.
697
702
.
8.
Yan
,
X.
,
Guo
,
J.
,
Lan
,
Y.
, and
Cheng
,
X.
,
2013
, “
A Biterm Topic Model for Short Texts
,”
Proceedings of the 22nd International Conference on World Wide Web
,
Rio de Janeiro, Brazil
,
May 13–17
, pp.
1445
1456
.
9.
Blei
,
D. M.
,
2012
, “
Probabilistic Topic Models
,”
Commun. ACM
,
55
(
4
), pp.
77
84
. 10.1145/2133806.2133826
10.
Blei
,
D. M.
,
Ng
,
A. Y.
, and
Jordan
,
M. I.
,
2003
, “
Latent Dirichlet Allocation
,”
J. Mach. Learn. Res.
,
3
, pp.
993
1022
.
11.
Evert
,
S.
,
Greiner
,
P.
,
Baigger
,
J. F.
, and
Lang
,
B.
,
2016
, “
A Distributional Approach to Open Questions in Market Research
,”
Comput. Ind.
,
78
, pp.
16
28
. 10.1016/j.compind.2015.10.008
12.
Tanguy
,
L.
,
Tulechki
,
N.
,
Urieli
,
A.
,
Hermann
,
E.
, and
Raynal
,
C.
,
2016
, “
Natural Language Processing for Aviation Safety Reports: From Classification to Interactive Analysis
,”
Comput. Ind.
,
78
, pp.
80
95
. 10.1016/j.compind.2015.09.005
13.
Sabbagh
,
R.
, and
Ameri
,
F.
,
2018
, “
Supplier Clustering Based on Unstructured Manufacturing Capability Data
,”
ASME 2018 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference
,
Quebec City, Canada
, p. V01BT02A036.
14.
Bastani
,
K.
,
Barazandeh
,
B.
, and
Kong
,
Z. J.
,
2018
, “
Fault Diagnosis in Multistation Assembly Systems Using Spatially Correlated Bayesian Learning Algorithm
,”
ASME J. Manuf. Sci. Eng.
,
140
(
3
), p.
31003
. 10.1115/1.4038184
15.
Barazandeh
,
B.
,
Bastani
,
K.
,
Rafieisakhaei
,
M.
,
Kim
,
S.
,
Kong
,
Z.
, and
Nussbaum
,
M. A.
,
2017
, “
Robust Sparse Representation-Based Classification Using Online Sensor Data for Monitoring Manual Material Handling Tasks
,”
IEEE Trans. Autom. Sci. Eng.
,
15
(
4
), pp.
1573
1584
. 10.1109/tase.2017.2729583
16.
Gupta
,
V.
, and
Lehal
,
G. S.
,
2009
, “
A Survey of Text Mining Techniques and Applications
,”
J. Emerg. Technol. Web Intell.
,
1
(
1
), pp.
60
76
.
17.
Ittoo
,
A.
,
Nguyen
,
L. M.
, and
van den Bosch
,
A.
,
2016
, “
Text Analytics in Industry: Challenges, Desiderata and Trends
,”
Comput. Ind.
,
78
, pp.
96
107
. 10.1016/j.compind.2015.12.001
18.
Drewes
,
B.
,
2005
, “Some Industrial Applications of Text Mining,”
Knowledge Mining
,
Springer
,
New York
, pp.
233
238
.
19.
Hrcka
,
L.
,
Simoncicova
,
V.
,
Tadanai
,
O.
,
Tanuska
,
P.
, and
Vazan
,
P.
,
2017
, “
Using Text Mining Methods for Analysis of Production Data in Automotive Industry
,”
Computer Science On-Line Conference
.
20.
Wang
,
J.
,
Wang
,
Q.
, and
Matsukawa
,
H.
,
2013
, “
A Configuration Study on Manufacturing Systems in Turbulent Environment Based on Text Mining
,”
Int. Inf. Inst. (Tokyo). Inf.
,
16
(
7
), p.
4627
.
21.
Liu
,
Y.
,
Lu
,
W. F.
, and
Loh
,
H. T.
,
2006
, “
A Framework of Information and Knowledge Management for Product Design and Development—A Text Mining Approach
,”
IFAC Proc.
,
39
(
3
), pp.
667
672
. 10.3182/20060517-3-FR-2903.00339
22.
Lee
,
J.
, and
Hong
,
Y. S.
,
2016
, “
Extraction and Visualization of Industrial Service Portfolios by Text Mining of 10-K Annual Reports
,”
Flex. Serv. Manuf. J.
,
28
(
4
), pp.
551
574
. 10.1007/s10696-015-9235-1
23.
Alkahtani
,
M.
,
Choudhary
,
A.
,
De
,
A.
, and
Harding
,
J. A.
,
2018
, “
A Decision Support System Based on Ontology and Data Mining to Improve Design Using Warranty Data
,”
Comput. Ind. Eng.
,
128
, pp.
1027
1039
. 10.1016/j.cie.2018.04.033
24.
Yang
,
J.
,
Kim
,
E.
,
Hur
,
M.
,
Cho
,
S.
,
Han
,
M.
, and
Seo
,
I.
,
2018
, “
Knowledge Extraction and Visualization of Digital Design Process
,”
Expert Syst. Appl.
,
92
, pp.
206
215
. 10.1016/j.eswa.2017.09.002
25.
Chen
,
Y.
, and
Lee
,
J.
,
2011
, “
Autonomous Mining for Alarm Correlation Patterns Based on Time-Shift Similarity Clustering in Manufacturing System
,”
2011 IEEE Conference on Prognostics and Health Management (PHM)
,
Denver, CO
, pp.
1
8
.
26.
Zhai
,
Z.
,
Liu
,
B.
,
Xu
,
H.
, and
Jia
,
P.
,
2011
, “
Constrained LDA for Grouping Product Features in Opinion Mining
,”
Pacific-Asia Conference on Knowledge Discovery and Data Mining
,
Shenzhen, China
, pp.
448
459
.
27.
Shotorbani
,
P. Y.
,
Ameri
,
F.
,
Kulvatunyou
,
B.
, and
Ivezic
,
N.
,
2016
, “
A Hybrid Method for Manufacturing Text Mining Based on Document Clustering and Topic Modeling Techniques
,”
IFIP International Conference on Advances in Production Management Systems
,
Iguasu Falls, Brazil
, pp.
777
786
.
28.
Benoit
,
K.
,
Watanabe
,
K.
,
Wang
,
H.
,
Nulty
,
P.
,
Obeng
,
A.
,
Müller
,
S.
, and
Matsuo
,
A.
,
2018
, “
Quanteda: An R Package for the Quantitative Analysis of Textual Data
,”
J. Open Source Softw.
,
3
(
30
), p.
774
. 10.21105/joss.00774
29.
Ramos
,
J.
,
2003
, “
Using Tf-Idf to Determine Word Relevance in Document Queries
,”
Proceedings of the First Instructional Conference on Machine Learning
,
Washington DC
, pp.
133
142
.
30.
Manning
,
C. D.
,
Manning
,
C. D.
, and
Schütze
,
H.
,
1999
,
Foundations of Statistical Natural Language Processing
,
MIT Press
,
Cambridge, MA
.
31.
Chowdhury
,
G. G.
,
2003
, “
Natural Language Processing
,”
Annu. Rev. Inf. Sci. Technol.
,
37
(
1
), pp.
51
89
. 10.1002/aris.1440370103
32.
Bruni
,
E.
,
Tran
,
N.-K.
, and
Baroni
,
M.
,
2014
, “
Multimodal Distributional Semantics
,”
J. Artif. Intell. Res.
,
49
, pp.
1
47
. 10.1613/jair.4135
33.
Series
,
H.
, and
Algebra
,
L.
,
1970
, “
Singular Value Decomposition and Least Squares Solutions
,”
Numer. Math.
,
420
(
5
), pp.
403
420
. 10.1007/bf02163027
34.
Landauer
,
T. K.
,
2006
,
Latent Semantic Analysis
,
Wiley Online Library
,
New York
.
35.
Zhang
,
W.
,
Yoshida
,
T.
, and
Tang
,
X.
,
2011
, “
A Comparative Study of TF* IDF, LSI and Multi-Words for Text Classification
,”
Expert Syst. Appl.
,
38
(
3
), pp.
2758
2765
. 10.1016/j.eswa.2010.08.066
36.
Salakhutdinov
,
R.
, and
Hinton
,
G.
,
2009
, “
Semantic Hashing
,”
Int. J. Approx. Reason.
,
50
(
7
), pp.
969
978
. 10.1016/j.ijar.2008.11.006
37.
Papadimitriou
,
C. H.
,
Raghavan
,
P.
,
Tamaki
,
H.
, and
Vempala
,
S.
,
2000
, “
Latent Semantic Indexing: A Probabilistic Analysis
,”
J. Comput. Syst. Sci.
,
61
(
2
), pp.
217
235
. 10.1006/jcss.2000.1711
38.
Jolliffe
,
I.
,
2011
,
Principal Component Analysis
,
Springer
,
New York
.
39.
Ljungberg
,
B. F.
, “
Dimensionality Reduction for Bag-of-Words Models: PCA vs LSA
.” Semanticscholar.org
40.
Guo
,
J.
,
James
,
G.
,
Levina
,
E.
,
Michailidis
,
G.
, and
Zhu
,
J.
,
2010
, “
Principal Component Analysis With Sparse Fused Loadings
,”
J. Comput. Graph. Stat.
,
19
(
4
), pp.
930
946
. 10.1198/jcgs.2010.08127
41.
Hartigan
,
J. A.
, and
Wong
,
M. A.
,
1979
, “
Algorithm AS 136: A k-Means Clustering Algorithm
,”
J. R. Stat. Soc. Ser. C (Appl. Stat.)
,
28
(
1
), pp.
100
108
.
42.
Kodinariya
,
T. M.
, and
Makwana
,
P. R.
,
2013
, “
Review on Determining Number of Cluster in K-Means Clustering
,”
Int. J.
,
1
(
6
), pp.
90
95
.
43.
Wallach
,
H. M.
,
2006
, “
Topic Modeling: Beyond Bag-of-Words
,”
Proceedings of the 23rd International Conference on Machine Learning
,
Pittsburgh, PA
, pp.
977
984
.
44.
Bentler
,
P. M.
, and
Bonett
,
D. G.
,
1980
, “
Significance Tests and Goodness of Fit in the Analysis of Covariance Structures
,”
Psychol. Bull.
,
88
(
3
), p.
588
. 10.1037/0033-2909.88.3.588
45.
Rayner
,
J. C. W.
,
Thas
,
O.
, and
Best
,
D. J.
,
2009
,
Smooth Tests of Goodness of Fit: Using R
,
John Wiley & Sons
,
New York
.
You do not currently have access to this content.