SCA - Module 7
SCA - Module 7
Domain expertise
Tools and techniques
Text Mining Process
Task 1 Task 2 Task 3
Feedback Feedback
The inputs to the process The output of the Task 1 is a The output of the Task 2 is a The output of Task 3 is a
includes a variety of relevant collection of documents in flat file called term-document number of problem specific
unstructured (and semi- some digitized format for matrix where the cells are classification, association,
structured) data sources such computer processing populated with the term clustering models and
as text, XML, HTML, etc. frequencies visualizations
Document 2 1
Document 3 3 1
Document 4 1
Document 5 2 1
Document 6 1 1
...
Text Mining Process
• Step 2: Create the Term–by–Document Matrix (TDM)
• Should all terms be included?
• Stop words, include words
• Synonyms, homonyms
• Stemming
• What is the best representation of the indices (values in cells)?
• Row counts; binary frequencies; log frequencies;
• Inverse document frequency
Text Mining Process
• Step 2: Create the Term–by–Document Matrix (TDM)
• TDM is a sparse matrix. How can we reduce the dimensionality of the
TDM?
• Manual - a domain expert goes through it
• Eliminate terms with very few occurrences in very few documents (?)
• Transform the matrix using SVD
• SVD is similar to principle component analysis
Text Mining Process
• Step 3: Extract patterns/knowledge
• Classification (text categorization)
• Clustering (natural groupings of text)
• Improve search recall
• Improve search precision
• Scatter/gather
• Query-specific clustering
• Association rules
• Trend Analysis
Exploratory data analysis
• Gives insight about the data such as:
• Class distribution
• Top occurring words in the dataset
• Distribution of words per document
• Trade in Asia
• Logistics
• Macroeconomic issues
• China issues
• Energy issues
• Semiconductor
shortages
• Automobile supply chain
Sadeek and Hanaoka, 2023. Social Network Analysis and Mining
Trade in Asia
• Risks in Asian Market
• Trade in Asia
• The combination: “asia world time china japan people nikkei
political system covid lot big markets point coming real risks.”
• This word combination indicates plausible risks in the Asian
market
• A cluster of “trade japan taiwan economic south india korea
china security minister australia president indopacific,”
• This indicates a trade issue in the Asian region
Logistics and supply chain- Shipping, Port and
Logistics
• The word combination: “shipping port ports goods biden Canada cargo transport container freight logistics air
president white house.”
• In this cluster, the issues related to shipping, ports, containers, freight, logistics, cargo, etc. The presence of other
words, such as biden, Canada, president, white house may be indicative of the truck driver strikes at the USA–Canada
border.
• Retailers and Shopping risk
• The cluster is: “labor company stores holiday chain products forced retailers supply retail store vietnam cotton online
xinjiang.”
• By examining these words, duringg Omicron, Christmas and Black Friday might be dominant occasions in terms of
sales, and that possible labor shortages may be experienced during this time
• Supply Chain Revenue - Regional Supply Chain - Supply Chain Shortage
• Supply chain shortages were very common.
• From the beginning of the pandemic, “global shortage” and “disruptions” had been in the news constantly.
• In addition, “Regional Supply Chain” is represented by “billion company business companies group market investors
yuan million financial arm capital.”
• The word “regional” may indicate financial investment in the local market
Macroeconomic issues
• Price Surge risk
• The combination: “inflation bank rate central rates policy prices interest monetary fed
economy higher price market global consumer.”
• This topic represents the issue of price hikes, inflation and struggles experienced by
financial policymakers. “Price Surge” has been a problem since the beginning of the
COVID-19 pandemic due to issues of international trade
• Raw Material Import–Export
• It reflects issues in the importing and exporting of raw materials.
• Economic Growth
• The combination: “growth economic economy exports supply demand covid pandemic
domestic quarter expected gdp.”
• The pandemic and the war have both decreased economic growth in most countries.
Economic growth largely depends on imports, exports and trade. Therefore, news media
identified this topic as an important risk for supply chain management.
China issues
• China's Foreign Strategy
• China’s Covid Policy
• Due to China’s international policy, supply chain operations had
experienced issues related to trade, port congestion, and air and maritime
transport
• Energy issues
• Food and Oil Price Hit
• Energy Supply
• Energy prices have increased due to Russia and Ukraine no longer
exporting gas and oil
• Many countries are experiencing energy price hikes
Semiconductor shortage
• Chip Industry and Shortage
• Electronic Parts Production
• The word: “chip semiconductor chips industry taiwan manufacturing samsung global
billion production supply,”
• Another cluster “apple nikkei asia production told foxconn components suppliers
company iphone china tech supply maker.”
• These cluster of words clearly refer to the supply side of chip production and the
potential shortage of semiconductors for tech giants.
• Automobile supply chain
• Electric Car Production
• Automobile Supply Chain
• Supply chain disruption is largely related to the supply and production of electric and
electronic parts, semiconductor supply and energy supply (in e.g., Japan)
Logistics and supply chain issues
• In terms of logistics, some topics detected on Twitter
• Cargo Shipping Restrictions
• Supply Delay—Retailers
• Supply Shortage—War
• Supply Chain Resiliency
• Logistics Tension
Text Mining Tools
• Commercial Software Tools
• SPSS PASW Text Miner
• Statistica Data Miner
• Free Software Tools
• Netlytics (https://netlytic.org/)
• Voyant tool (https://voyant-tools.org/)
• ATLAS.ti (https://atlasti.com/)
• Topic modeling tool from Google code
(https://code.google.com/archive/p/topic-modeling-tool/)
Vector space modeling
Vector space modeling
• Set-of-Words: Documents
represented by vectors ∈ {0, 1}|Σ|
• Bag-of-Words: Documents
represented by term-frequency
vectors ∈ N |Σ|
Issues with Sets and Bag of Words
• representation has associated high
computational complexity
• Dimensionality blow up, |Σ| could
be very large
Vector space modeling
Tf-idf (Term frequency-Inverse document frequency) is more refined
model to select features to represent texts
• Key idea is to find special words characterizing the document
• Frequency:
• Most frequent words implies most significant in doc
• Most frequent words (“the”, “are”, “and”) help English structure and build
ideas but not significant in characterizing documents
• Rarity: Indicator of topics are rare words
• rare words overall but concentrated in a few docs “batsman”, “prime-
minister”
• ball, bat, pitch, catch, run =⇒ cricket related doc
TF-IDF
TF-IDF
• Dream of AI community
• to have algorithms that are capable of automatically reading and
obtaining knowledge from text
NLP Task Categories
• Information retrieval
• Information extraction
• Named-entity recognition
• Question answering
• Automatic summarization
• Natural language generation and understanding
• Machine translation
• Foreign language reading and writing
• Text proofing
Use of NLP in SC logistics
Customer and partner communication:
• NLP-enabled chatbots are perfect for automating customer service
tasks and communicating with logistics partners
• They can assist with order tracking, scheduling, and resolving issues
Document automation
• In logistics, the processing of various documents like invoices,
shipping labels, and customs forms is tedious but critical
• NLP can automate this process by reading and interpreting text data,
even in different languages and formats, and populating databases or
systems as required
Use of NLP in SC logistics
Real-time monitoring and alerts
• Advanced NLP systems can analyze textual data from multiple sources
like emails, social media, news outlets, and other public records to
generate real-time alerts about potential disruptions in logistics
operations
• For example, they can send warnings about possible strikes, road
closures, or natural disasters that could impact the supply chain