100% found this document useful (1 vote)

185 views

Named Entity Recognition Using Deep Learning

This document describes using a deep learning approach with ELMo embeddings and a bi-directional LSTM for named entity recognition (NER). NER aims to locate and classify named entities in text into predefined categories like person names, organizations, locations etc. The approach uses pre-trained ELMo embeddings to capture word context and meaning, and a bi-LSTM to understand sequences of words and labels. The model achieves good performance, with an F1-score of 81.2% for NER and 97.1% for part-of-speech tagging on a benchmark dataset, outperforming a popular NER tool. The code is available on GitHub for others to replicate the approach.

Uploaded by

Zerihun Yitayew

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

185 views

Named Entity Recognition Using Deep Learning

Uploaded by

Zerihun Yitayew

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

Named Entity Recognition using Deep

Learning(ELMo Embedding+ Bi-LSTM)

Introduction :

Named-entity recognition (NER) (also known as entity

identification, entity chunking and entity extraction) is a
subtask of information extraction that seeks to locate and classify
named entities mentioned in unstructured text into pre-defined
categories such as person names, organisations, locations,
medical codes, time expressions, quantities, monetary
values, percentages, etc.

It adds a wealth of semantic knowledge to your content and helps you

to promptly understand the subject of any given text.

Applications :

Few applications of NER include: extracting important named

entities from legal, financial, and medical
documents, classifying content for news providers, improving
the search algorithms, and etc.

Approaches to tackle this problem:

1. Machine Learning Approach : treating the problem as
a Multi-class classification with named entities are our labels .
The problem here is that for longer sentences identifying
and labelling named entities require thorough
understanding of the context of a sentence and sequence
of the word labels in it, which this method ignores and cannot
capture the essence of the entire sentence.

2. Deep Learning Approach : The best possible model which can

tackle this problem is Long-Short Time
Memory(LSTM) models, specifically we will use Bi-
directional LSTM for our setup . A Bi-directional LSTM is a
combination of two LSTM’s — one runs forward from
“right to left” and one runs backward from “left to right”,
thus capturing the entire essence/context of the
sentence . For NER, since the context covers past and future
labels in a sequence, we need to take both the past and the future
information into account.
Bi-LSTM

Embedding Layer : ELMo (Embedding from Language

Models): ELMo is a deep contextualised word representation
that models both ,

(1) complex characteristics of word use (e.g., syntax and

semantics), and

(2) how these uses vary across linguistic contexts (i.e., to model
polysemy). Example: Although ‘Apple’ term is common,
but ELMo will give different embeddings for both (fruit and
organisation) due to contextual logic.

Example: Also we need not worry about the Out-Of-

Vocabulary(OOV) token of training data , since ELMo would generate
a character embedding for that as well.

These word vectors are learned functions of the internal states of a

deep bidirectional language model (biLM), which is pre-trained on a
large text corpus. They can be easily added to existing models and
significantly improve the state of the art across a broad range of
challenging NLP problems, including question answering,
textual entailment and sentiment analysis.
ELMo

Let’s see how we can approach this problem :

1. Data Acquisition : We are going to use a dataset from Kaggle.
Please go through the data to know more about the different tags
used .
We have 47958 sentences in our
dataset, 35179 different words ,42 different POS and 17 different
named entities (Tags).

In this article we will build 2 different models for

predicting Tag and POS respectively.

2. Next we would use a Class which would convert every sentence

with its named entities (tags) into a list of tuples [(word, named
entity), …]
3. Let’s have a look at the distribution of the sentence lengths in the
dataset. so the longest sentence has 140 words in it and we can see
that almost all of the sentences have less than 60 words in them.
But due to hardware crunch we would use smaller length .i.e. 50
words, which can be easily processed.
4. Let’s create word-to-index and index-to-
word mapping which is necessary for conversions for words before
training and after prediction.
5. From the list of tuples generated earlier, now we will build the
independent and dependent variable structure.

 Independent variable / Words corpus :

 And the same applies for the named entities but we need
to map our labels to numbers this time
6. Train — Test Split (90:10):

7. Batch Training : Since we have 32 as the batch size, feeding

the network must be in chunks that are all multiples of 32.

8. Loading ELMo Embedding Layer :We will

import Tensorflow Hub ( a library for the publication, discovery,
and consumption of reusable parts of machine learning models) to
load the ELMo embedding feature and create a function so that we
can load it in the form of a layer , and start building
our Keras network.

Please downgrade your Tensorflow package, to use this

code. If you want to perform the same in TF 2 or
greater ,you have to use hub.load(url), then create a
KerasLayer(… , with trainable=True).

9. Designing our Neural Network:

 Embedding layer(ELMo): We will specify the maximum length

(50) of the padded sequences. After the network is trained, the
embedding layer will transform each token into a vector of n
dimensions.

 Bidirectional LSTM: Bidirectional LSTM takes a recurrent

layer (e.g. the first LSTM layer) as an argument. This layer takes
the output from the previous embedding layer .
 We will use 2 Bi LSTM layers and residual connection to
the first BiLSTM

 TimeDistributed Layer: We are dealing with Many to Many

RNN Architecture, where we expect output from every input
sequence. Here is an example, in the sequence (a1 →b1, a2 →b2…
an →bn), a, and b are inputs and outputs of every sequence. The
TimeDistributeDense layers allow Dense(fully-connected)
operation across every output over every time-step. Now using
this layer will result in one final output.
10. Training : Ran this for only 1 epoch since it was taking a lot of
time. But the results are awesome.

11. Batch Prediction and using index-to-tag to convert the

predicted indices back to word format .
12. Evaluation Metric : In case of NER, we might be dealing with
important financial, medical, or legal documents and precise
identification of named entities in those documents determines the
success of the model. In other words, false positives and false
negatives have a business cost in a NER task. Therefore, our
main metric to evaluate our models will be F1 score because we need
a balance between precision and recall.

 We were able to get F1-Score of 81.2% which is pretty

good, if you look at the Micro,Macro and Average F1
scores as well they are pretty good. If you train this for
more epochs you would definitely get better results.

13. Comparing our results with SPACY: We can see our model
was able to detect every tag correctly even in single epoch.
Our Model Results

SPACY

14. Parts of Speech Tagging/Prediction: Since, we also

had Parts of Speech (POS) in our dataset, we can build similar
model for predicting that as well. I have implemented that as
well and trained that for 1 epoch and the results were again
awesome.

 We were able to get F1-Score of 97.1% which is pretty

good, if you look at the Micro,Macro and Average F1
scores as well they are pretty good.
Comparing our results with SPACY: We can see our model was
able to detect every tag correctly even in single epoch.
Our Model results
SPACY

Thanks for reading this blog. If you liked it please clap,

follow and share.

Where can you find my code ?

Github : https://github.com/SubhamIO/Named-Entity-Recognition-
using-ELMo-BiLSTM

References :

1. https://jalammar.github.io/illustrated-bert/

2. https://arxiv.org/pdf/1802.05365.pdf

3. https://en.wikipedia.org/wiki/Named-entity_recognition
4. https://allennlp.org/elmo

5. https://sunjackson.github.io/
2018/12/11/1ef8909353df3395a36f3f4d3336269b/

TEXT-PROMPTED REMOTE SPEAKER AUTHENTICATION - Project Report - GANESH TIWARI - IOE - TU
94% (18)
TEXT-PROMPTED REMOTE SPEAKER AUTHENTICATION - Project Report - GANESH TIWARI - IOE - TU
71 pages
SSZG531 Lecture 1 Introduction To Pervasive Computing
No ratings yet
SSZG531 Lecture 1 Introduction To Pervasive Computing
13 pages
Mobile Computing Assignment
No ratings yet
Mobile Computing Assignment
20 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Project Report
100% (1)
Project Report
60 pages
A Comprehensive Evaluation of Cryptographic Algorithms Des 3des Aes Rsa and Blowfish PDF
No ratings yet
A Comprehensive Evaluation of Cryptographic Algorithms Des 3des Aes Rsa and Blowfish PDF
8 pages
E Business Notes For Ggsipu
No ratings yet
E Business Notes For Ggsipu
41 pages
Managerial Computer Lab
No ratings yet
Managerial Computer Lab
14 pages
Scigen Paper Accepted by International Journal of Research in Computer Science
No ratings yet
Scigen Paper Accepted by International Journal of Research in Computer Science
7 pages
IoT Based Biometric Voting System
No ratings yet
IoT Based Biometric Voting System
12 pages
Nano Computing
100% (1)
Nano Computing
22 pages
SYMBIAN OS Report
No ratings yet
SYMBIAN OS Report
25 pages
Sign Language Detection
No ratings yet
Sign Language Detection
5 pages
PRJT Repo
No ratings yet
PRJT Repo
50 pages
Nano Computing: Name: - K. Sreenath Reddy Year & Sem: - 4th Year 2nd Sem Roll Number: - 17911A05M0. Section: - B
No ratings yet
Nano Computing: Name: - K. Sreenath Reddy Year & Sem: - 4th Year 2nd Sem Roll Number: - 17911A05M0. Section: - B
11 pages
Unit 4 - Internet of Things - WWW - Rgpvnotes.in PDF
No ratings yet
Unit 4 - Internet of Things - WWW - Rgpvnotes.in PDF
12 pages
HCT210 Lecture Notes - Unit 1
100% (2)
HCT210 Lecture Notes - Unit 1
46 pages
Biomatrics
No ratings yet
Biomatrics
39 pages
Smart Traffic Lights Using Iottechnologies: Project Concept and Progress
No ratings yet
Smart Traffic Lights Using Iottechnologies: Project Concept and Progress
11 pages
Arduino Based RFID Controlled Automatic E-TOLL Collection System
No ratings yet
Arduino Based RFID Controlled Automatic E-TOLL Collection System
3 pages
Unit 4 Architectural Approach For IoT
0% (1)
Unit 4 Architectural Approach For IoT
14 pages
Mini Project 1 1
No ratings yet
Mini Project 1 1
66 pages
Radio Frequency System
No ratings yet
Radio Frequency System
8 pages
A Summer Training Report On Chat Massenger
No ratings yet
A Summer Training Report On Chat Massenger
62 pages
Tcp-Ip: A Seminar Report On
No ratings yet
Tcp-Ip: A Seminar Report On
19 pages
Project Title:: Fingerprint Voting System For University
No ratings yet
Project Title:: Fingerprint Voting System For University
8 pages
Information Technology Unit 3
No ratings yet
Information Technology Unit 3
7 pages
Generating Fake News Detection Model Using A Two-Stage Evolutionary Approach 7th Aug 2023 Published
No ratings yet
Generating Fake News Detection Model Using A Two-Stage Evolutionary Approach 7th Aug 2023 Published
19 pages
RFID Based Automated Toll Plaza System
No ratings yet
RFID Based Automated Toll Plaza System
7 pages
Android Based Study Materials and Notes Management System
No ratings yet
Android Based Study Materials and Notes Management System
3 pages
Face Mask Detection
No ratings yet
Face Mask Detection
4 pages
Cyber Security Note
No ratings yet
Cyber Security Note
18 pages
Faculty of Graduate Studies and Research Master of Science in Information Technology
No ratings yet
Faculty of Graduate Studies and Research Master of Science in Information Technology
31 pages
DCT Based Video Watermarking in MATLAB PDF
No ratings yet
DCT Based Video Watermarking in MATLAB PDF
11 pages
GIFI
No ratings yet
GIFI
16 pages
Document 8
No ratings yet
Document 8
14 pages
2factor Pro MZC
No ratings yet
2factor Pro MZC
61 pages
Cs6007 - Information Retrieval: Objectives: The Student Should Be Made To
No ratings yet
Cs6007 - Information Retrieval: Objectives: The Student Should Be Made To
24 pages
M2M SDN NFV
No ratings yet
M2M SDN NFV
24 pages
What Is A Microcontroller?: Microcontrollers
No ratings yet
What Is A Microcontroller?: Microcontrollers
16 pages
Reverse Car Parking System: IOT Project On
100% (1)
Reverse Car Parking System: IOT Project On
12 pages
Neural Networks For Unicode Optical Character Recognition
No ratings yet
Neural Networks For Unicode Optical Character Recognition
2 pages
Design and Implementation of E-Commerce Site For Online Shopping
No ratings yet
Design and Implementation of E-Commerce Site For Online Shopping
22 pages
Iot-Based Monitoring System: Car'S Parking
No ratings yet
Iot-Based Monitoring System: Car'S Parking
11 pages
A Review On Smart Parking Systems in Metropolitan Cities Using Iot
No ratings yet
A Review On Smart Parking Systems in Metropolitan Cities Using Iot
9 pages
Image Segmentation For Object Detection Using Mask R-CNN in Colab
No ratings yet
Image Segmentation For Object Detection Using Mask R-CNN in Colab
5 pages
IT 5020 Advanced Database Technologies
No ratings yet
IT 5020 Advanced Database Technologies
4 pages
Crime Type and Occurrence Prediction Using Machine Learning
No ratings yet
Crime Type and Occurrence Prediction Using Machine Learning
28 pages
Data Handling in I.O.T: R.K.Biradar
No ratings yet
Data Handling in I.O.T: R.K.Biradar
16 pages
Communication Interface
No ratings yet
Communication Interface
10 pages
IOT Exam Paper
0% (1)
IOT Exam Paper
2 pages
Skin Cancer Detection Using Convolutional Neural Network
No ratings yet
Skin Cancer Detection Using Convolutional Neural Network
8 pages
Industrial Training On Iot Smart Parking System
No ratings yet
Industrial Training On Iot Smart Parking System
15 pages
Chapter One ISR
No ratings yet
Chapter One ISR
25 pages
Seminar Report
No ratings yet
Seminar Report
13 pages
Automatic Fault Detection System Using PLC
No ratings yet
Automatic Fault Detection System Using PLC
26 pages
HDL. London: Pearson Education.: Course Syllabus
No ratings yet
HDL. London: Pearson Education.: Course Syllabus
2 pages
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
From Everand
Equity of Cybersecurity in the Education System: High Schools, Undergraduate, Graduate and Post-Graduate Studies.
Joseph O. Esin
No ratings yet
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
From Everand
The Today and Future of WSN, AI, and IoT: A Compass and Torchbearer for the Technocrats
Dr.Chandrakant
No ratings yet
Emerging Technologies in Information and Communications Technology
From Everand
Emerging Technologies in Information and Communications Technology
Fouad Sabry
No ratings yet
Will
No ratings yet
Will
14 pages
Chapter Three: Part III: Zerihun Yitayew
No ratings yet
Chapter Three: Part III: Zerihun Yitayew
24 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
26 pages
Chapter Three: Part II: By: Zerihun Yitayew
No ratings yet
Chapter Three: Part II: By: Zerihun Yitayew
29 pages
Decision On Institutional Arrangements For The General Agreement On Trade in Services
No ratings yet
Decision On Institutional Arrangements For The General Agreement On Trade in Services
1 page
Decision On Financial Service
No ratings yet
Decision On Financial Service
1 page
ITL PPT Aw
No ratings yet
ITL PPT Aw
110 pages
Answer Key
No ratings yet
Answer Key
5 pages
Grammar Task 1. Fill in The Gaps With Appropriate Question Words
100% (1)
Grammar Task 1. Fill in The Gaps With Appropriate Question Words
3 pages
Design of Foundation For CC Pole of 10m Height
No ratings yet
Design of Foundation For CC Pole of 10m Height
3 pages
LABS43 SBC Animal Feed
No ratings yet
LABS43 SBC Animal Feed
1 page
The Specifications of DH9920C: Details PDF
No ratings yet
The Specifications of DH9920C: Details PDF
1 page
Flyer Rosemount Wireless Permasense Corrosion Erosion Monitoring System For Refineries en 5390998
No ratings yet
Flyer Rosemount Wireless Permasense Corrosion Erosion Monitoring System For Refineries en 5390998
2 pages
Ahmed Nauman Naik: SHU ID: 26036389
No ratings yet
Ahmed Nauman Naik: SHU ID: 26036389
10 pages
Antenna
100% (2)
Antenna
68 pages
Control Systems Laboratory: Analogue Computer
No ratings yet
Control Systems Laboratory: Analogue Computer
11 pages
CRM2010 Eng PDF
No ratings yet
CRM2010 Eng PDF
2 pages
InternshipReport 3
No ratings yet
InternshipReport 3
34 pages
G12 T 1 Key Ans
No ratings yet
G12 T 1 Key Ans
3 pages
Chapter 6 Notes - Student1
No ratings yet
Chapter 6 Notes - Student1
79 pages
Instruction Sheet: VIQUA UV Lamp and Quartz Sleeve Replacement
No ratings yet
Instruction Sheet: VIQUA UV Lamp and Quartz Sleeve Replacement
5 pages
Blavatsky's Diagram of Meditation and The Process of Spiritual Transformation
No ratings yet
Blavatsky's Diagram of Meditation and The Process of Spiritual Transformation
7 pages
ME3493 MANUFACTURING TECHNOLOGY syllabus
No ratings yet
ME3493 MANUFACTURING TECHNOLOGY syllabus
2 pages
مركز المهندس-1
No ratings yet
مركز المهندس-1
11 pages
WWW - Manaresults.co - In: Set No. 1
No ratings yet
WWW - Manaresults.co - In: Set No. 1
4 pages
A Study On Customer Perception Towards HDFC Limited
No ratings yet
A Study On Customer Perception Towards HDFC Limited
13 pages
Double MS Plugin: User Guide
No ratings yet
Double MS Plugin: User Guide
9 pages
Unit 3: Capacity Requirement Planning (CRP)
No ratings yet
Unit 3: Capacity Requirement Planning (CRP)
25 pages
Your Guide To Abs and Ebs: Updated September 2003
No ratings yet
Your Guide To Abs and Ebs: Updated September 2003
40 pages
Сеть глобального корпоративного контроля
No ratings yet
Сеть глобального корпоративного контроля
74 pages
Diagrama de Seguridad
No ratings yet
Diagrama de Seguridad
2 pages
Fluency Speaking Activities
No ratings yet
Fluency Speaking Activities
9 pages
KD.3.1 How To Introduce Yourself in English
No ratings yet
KD.3.1 How To Introduce Yourself in English
13 pages
Script
No ratings yet
Script
19 pages
FS 1 Learning Ep 14
No ratings yet
FS 1 Learning Ep 14
8 pages
Focus Group Discussion Assessment Rubric
100% (1)
Focus Group Discussion Assessment Rubric
2 pages
Waste Water Treatment 2023 Mid Sem Question Paper
No ratings yet
Waste Water Treatment 2023 Mid Sem Question Paper
4 pages