Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Irjet V7i51173

Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/341793640

A Cloud based Medical Transcription using Speech Recognition Technologies

Article · May 2020

CITATIONS READS

2 769

3 authors, including:

Sushmita Kulkarni Dattaprasad A. Torse


Gogte Institute of Technology KLE Dr. M. S. Sheshgiri Cocllege of Engineering and Technology Belagavi
1 PUBLICATION 2 CITATIONS 48 PUBLICATIONS 207 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Dattaprasad A. Torse on 01 June 2020.

The user has requested enhancement of the downloaded file.


International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

A Cloud based Medical Transcription using Speech Recognition


Technologies

Sushmita Kulkarni1, Dattaprasad A. Torse2, Deepak Kulkarni3


1M.Tech, Department of Electronics and Communication Engineering, KLS Gogte Institute of Technology,
Belagavi, India
2,3Department of Electronics and Communication Engineering, KLS Gogte Institute of Technology, Belagavi, India

----------------------------------------------------------------------***-------------------------------------------------------------------
Abstract: Digital Healthcare has become the most prominent and trending platform for treatment now a days. One such
initiative is to build a doctor-friendly digital system. This system will allow doctors to store their patient details, consultations,
surgeries performed and many more related information about each and every patient unlike the traditional methods. To
build a prototype showcasing digital medical transcription platform which will help surgeons and physicians to document
their patients consultations and summary of surgeries performed by recording with a click of a button. Some open source
network technologies like uniMRCP, open source EPBX scalable FreeSwitch, standard protocols of Voice Over IP, (i.e. signalling
- SIP and audio media - RTP), Speech Recognition engines supporting uniMRCP as Google Speech Recognition or CMU's
PocketSphinx are used. The main idea behind is to transform voice recording to a text document to be presented as a part of
Electronic Medical Record system using Speech Recognition and Synthesis technologies.

Key Words: Cloud, Freeswitch, uniMRCP, Google Speech Recognition plugin, PocketSphinx.

I. INTRODUCTION E-Healthcare systems can come to rescue of people


during pandemics, natural disasters or at times when
The surgeons are using traditional methods of patient cannot reach hospitals. The main focus is to
documenting procedures/surgeries carried out and they enable and evolve digital healthcare platform, in turn
are either using paper documentation or typing the bridging the gaps between doctor’s community and
procedure/surgery details performed on a patient in a patients. Thereby becoming doctor’s partner in enabling
text editor of their choice. Some of these documentations and enhancing the clinical OPD and online consultations
are not completely integrated with patient electronic using newer technologies.
record systems. In smaller hospitals, the doctors are still
using traditional methods of paper documentation for II. LITERATURE
each and every recording of patient history, surgeries
performed and other clinical details. The doctors and Transcribing paper-based archives into digital form was
associated medical staff spend lot of time in helpful step for educators, clinical researchers and
documenting and recording patient details during the people capturing data on fields for research purposes,
clinical visits and after the surgery performed by the where records were kept in spreadsheets on regular
doctor. Some of these manual and time consuming basis. Usually when it comes to medical records
documentation activities can be digitized using voice involving confidential patient data, needs to be handled
activated and cloud based recording and transcription very sensitively. According to Mohammad M. Ghassemi
technologies presented in this paper. These technologies et al. [1] proposes a tool based on machine learning,
work together and help the healthcare providers in crowd intelligence, optical character recognition, image
digitizing the various documentations seamlessly. As segmentation and crowd sourcing. This involved
time evolved the methods of handling, the records kept protection of personal information using images of
on evolving. Modern times bring to the stage of handling paper-based spreadsheet transcription into digital form.
patient records in a digital way from typing long patient- The steps followed in the algorithm were, (1)cell-level
data to Transcribing. Transcription involves technologies image extraction, (2) recognition of digits within the cell
of speech to text conversions using Speech recognition using machine learning, (3) uncertain machine
technologies. transcription content correction by humans, (4) human
transcribed content results as a feedback to improve
There are less different solutions reviewed and machine classification performance. The limitations here
presented for electronic medical record transcription are prolonged process of transcribing, collection of
using networking tools. Although there are many works spreadsheet data on regular basis and high-ended
displaying different solutions for speech to text, machine learning algorithms.
conversions using machine learning and AI technologies.
Over a span of 20 years Health Information Technology's
(HIT) healthcare awareness and hazards for safety risk

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6160
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

analysis has been introduced in medical and healthcare server architecture. Abdullah Mohammad Ansari et al.
organizations. The main idea was to link events to type [5] implemented Interactive Voice Response System
of hazards with efficient engineer centric solutions, for (IVRS) model for Session Initiation Protocol (SIP) based
safety during adverse situations. Healthcare hazards phones. Backbone of this application is scalable
means losing sensitive patient records if there is less FreeSwitch connects to SIP-based soft phones either
awareness within healthcare. This may influence patient desktop or mobile client as a FreeSwitch server. The SIP
safety corresponding to erroneous output of Medical registers themselves as client to FreeSwitch servers,
Information System (MIS) like the Electronic Health which in turn has information of all registered clients,
Record. Richard W. Jones et al. [2] highlights prevalence and other connected FreeSwitch servers. The idea of
of indirect hazards and regulatory standard measures accessing information in web browser of phone while on
implemented during deployment of addressed problem. call could save time and make this a more reliable
This research not only focuses on identification and approach.
removal of the risk but also indirect ones after
deployment. The problem is addressed in three Process of making a computer system understand what
ways.(1)modified MIL-STD-882E addresses currently we speak is nothing but computer speech recognition or
existing deficiencies when user makes executable interpretation of voice in the form of text. There are
decisions. It defines risks associated with erroneous many such recognition of speech software for
Information System (IS) system failure modes, Software appropriate speech to text conversions. Aditya Amberkar
Control Categories (SCC) hazard severity table to get et al. [6] proposes a Speech to Text using Speech
Software Criticality Index (SwCI), whose outputs are Recognition and Recurrent Neural Networks (RNN)
given to Level of Rigor (LOR). (2)Health applications based speech recognition model for prediction. Initially
(mHealth App) risk assessment. (3)Generic 8-Step IS the speech, which is an analog signal is digitized or
safety management Process adapted to applications. It is sampled by Nyquist theorem and pre-processing of the
very important to have a hazard-free transcribing signal to 20-millisecond chunks is done. This pre-
system; hence, there is always a priority for patient data processed data is fed to RNN. The application of RNN
safety. increases performance accuracy in much speech to text
conversion engines like Java, Python based snowboy hot
Voice over Internet Protocol (VoIP) and Electronic word detection, C, CMU pocket-sphinx. Amazon's Alexa
Private Branch Exchange (EPBX) are cost effective and Google's STT are online speech to text engines
methods unlike the traditional. Asterisk is an Open whereas CMU pocket-sphinx is offline conversion engine
Source is a Linux based server and Private Branch but training the dataset is done online. Although training
Exchange framework that allows a user to have a phone- the RNN algorithm is complex, it results as best
system of one's choice because of it flexibility to algorithm for speech processing and voice controlled
customize modules. Mohammed Abdul Qadeer et al. [3] technologies.
implements an Asterisk server within a local Wi-Fi
network and Public Switch Telephony Network (PSTN) Worldwide, commercial applications are having high
for registered devices within University usage. The demand for Automatic Speech recognition (ASR) but in
application architecture model involved Asterisk server, India, it is still evolving. Chadalavada Sai Manasa et al. [7]
a Client and PSTN Exchange for placing a voice and video developed acoustical model for the speech recognition in
based call over a private Wi-Fi cloud. Hindi using CMU’s PocketSphinx with a database of 177
words and dictionary of cross language adapted for
A distributed application Real-time online interactive speech recognition such as English.ASR model is based
application (ROIA)is a Cloud environment emerging in on Gaussian Mixture Hidden Markov model(GMM-HMM)
large scale. Additionally have issues like scalability and based acoustic modelling using LPC and Mel Frequency
network latency. Previous researchers tried and focused cepstral coefficients (MFCC) for feature extraction.
on mixed deployment of ROIA and extension of ROIA. PocketSphinx is lightweight, free and real-time
LIU Dong, et al. [4] proposes a system in which a new continuous medium vocabulary Speech Recognition
technology MRCP is deployed that overcomes network system developed for hand held devices.
latency issues, scalability of ROIA in cloud computing.
The solution focuses on MRCP architecture and external Freeswitch is a highly scalable engine for routing,
balance strategy to overcome fluctuations of concurrent interconnecting communication protocols for any type of
users and network latency requirements. The MRCP media namely audio, video or texts and is a cross
architecture has ROIA Servers (RS) and one MRCP Local platform telephone exchange that bridges business
Controller (MLC) for each data centre distributed across solution gaps. It uses embedded languages like Lua or
the world. MLC and RS responsibility is load balancing, JavaScript that makes it more flexible. Wei Tang, et al. [8]
storage, zoning and instancing. introduces a soft switch solution i.e. FreeSwitch for
efficient communication dispatching and accuracy in
FreeSwitch acts as a PBX (Private Branch Exchange) information using IMS architecture and SG-UAP based
server, open source scalable soft-switch. It follows client- application.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6161
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

Sila Chunwijitra, et al. [9], propose a cloud based the conversation with patient ID and then starts
framework for speech recognition in Thai language. They speaking as he or she normally does over the regular
also deploy Docker (lightweight Linux container) phone call. The FreeSWITCH IVR system records the
platform to migrate baseline Distributed speech speech of the doctor which is essentially the document
recognition (DSR) system. The main idea here is to that is supposed to be typed in a traditional way. The
improve response time in real time using cloud recorded speech or audio is then processed by the server
computing. Furthermore, the workflow is modified by application that communicates over SIP and MRCP
paralleling running multiple Speech Recognition (SR) signaling protocols. The voice is transmitted between
Engines with help of utterance decoding. Then on Word FreeSWITCH and MRCP server via RTP packets. The
Error Rate(WER)is computed and results seem to be MRCP server uses either Google Speech Recognition
scalable and reliable with no significant difference (GSR) cloud based API or PockSphinx module for
between proposed and baseline approaches. Hence transcribing the doctors speech to a text. The transcribed
overall performance is boosted with cloud computing text is returned by the GSR or PocketSphinx is stored in
benefits and improved response time in terms of real- the respective patient record database for presentation
time factor (RTF). as a part of patient Electronic Medical Record flows
during clinical visits or reviews conducted by the doctor
Resource Sharing is the benefit of using cloud-based web or surgeons in subsequent follow-ups. The transcribed
services. Sila Chunwijitra, et al. [10] focuses on documentation could be made available and viewable as
distributing and sharing resources for Automatic Speech a plain text at any time once the dictated document has
Recognition(ASR) applications. In case of Transcription, been transcribed in near real-time basis. The proposed
ASR needs more resources as many utterances must be system is as shown in Fig-1. Steps involved in the back-
handled in real time computing. For this key solution is end
scaling ASR by multithread processing, exploiting
multiplexing and demultiplexing technique to network 1. Call IVR-Internal extension (eg:1000)
socket or distributing ASR in real-time streaming or 2. Announce patient ID
distributing engines (load balance). This proposed work 3. Start audio recording or voice mail option
reduces RTF by 15% of the improved framework when 4. Store audio file(as patient_id.wav)
compared to the baseline system architecture and shares 5. notify/send command to Uni-MRCP Server
lesser resources like working memory. 6. Initiate Speech to Text / enable speech-to-text
API on Google Cloud Platform
"Google Cloud Speech API" is a Speech-to-Text and Text- 7. Store Text file
to-speech converting Google service, whose speech
recognition accuracy is high due to its deep learning
neural network algorithms. The algorithms do not
require high performance processors because everything
is processed in cloud. Gustavo Boza-Quispe, et al. [11]
proposes an user friendly speech interface to access
SIP CLIENT
tourist semantic information based on Google Cloud
Platform. The flow has stages like Text-to-Speech(TTS)
and Speech-to-Text(STT) Converter, Web Interface,
SPARQL Generator and Semantic Representation. Open Source
Scalable PBX
Due to increased adoption of smart phones and other
MOD_UNIMRCP
consumer devices speech has become one of the modes
of interaction. Yanzhang He, et al. [12] focuses beyond
acoustic (AM), pronunciation (PM), and language (LM)
models) satisfying computational and memory
constraints improved in earlier large vocabulary MRCP 2.0 SPEECH GOOGLE
continuous speech recognition (LVCSR)systems of ASR. RECOGNOTION CLOUD
PLATFORM
Their model throws 20% improved WER over a UNI-MRCP
SERVER
ENGINE

embedded baseline system because of the E2E speech


recognizer based on the RNN-T model. This model runs
double as fast as Google Pixel phone.

III. PROPOSED SYSTEM


Fig-1: Block Diagram of Cloud Based Medical
The SIP client is integrated with an Android APP that Transcription
initiates voice activated conversation with the doctor or
surgeon intended to document the procedure or surgery
performed on a patient. Typically, the doctor will start
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6162
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 05 | May 2020 www.irjet.net p-ISSN: 2395-0072

IV. CONCLUSION Recognition engines supposing uniMRCP as Google


Speech Recognition or CMU's PocketSphinx for more
In this paper, emerging technologies that offer flexible accurate results. Here networking tools and emerging
resources(cloud), Open source cross platform telephone open source technologies are utilized. The doctor will
performing multiple functions(Asterisk, freeswitch), and save the time in manually documenting the patient
speech engines (ASR, Google Speech API, Pocketspinx information after the surgery by dictating the notes
)are reviewed. FreeSwitch is an open source EPBX like using android application and transcription platform.
the Asterisk but is more at its flexibility and abundance Hence, the proposed system is cost effective, reliable and
to scale and add modules as per the choice of the user, most importantly can be implemented on cloud and need
which makes it a reliable and convenient platform not use any system resources for storage. As the
summarized as one machine doing multiple tasks. proposed solution is cloud based, doctor can access from
Therefore, our proposed model uses technologies like anywhere, through any device having internet, and store
uniMRCP, open source EPBX FreeSwitch, Speech unlimited patient data.
REFERENCES

[1] Ghassemi, Mohammad M., et al. "An open-source tool for the transcription of paper-spreadsheet data: Code and
supplemental materials available online: Https://github. com/deskool/images to spreadsheets." 2017 IEEE International
Conference on Big Data (Big Data). IEEE, 2017.

[2] Jones, Richard W., and James E. Mateer. "Indirect risk related failures of Medical Information Systems." 2019 14th IEEE
Conference on Industrial Electronics and Applications (ICIEA). IEEE, 2019.

[3] Qadeer, Mohammed Abdul, Kanika Shah, and Utkarsh Goel. "Voice-video communication on mobile phones and PCs'
using asterisk EPBX." 2012 International Conference on Communication Systems and Network Technologies. IEEE, 2012.

[4] Liu, Dong, and Yue-Long Zhao. "A new approach to scalable ROIA in cloud." 2013 Fourth International Conference on
Emerging Intelligent Data and Web Technologies. IEEE, 2013.

[5] Ansari, Abdullah Mohammad, Md Faisal Nehal, and Mohammed Abdul Qadeer. "SIP-based interactive voice response
system using freeswitch epbx." 2013 Tenth International Conference on Wireless and Optical Communications Networks
(WOCN). IEEE, 2013.

[6] Amberkar, Aditya, et al. "Speech Recognition using Recurrent Neural Networks." 2018 International Conference on
Current Trends towards Converging Technologies (ICCTCT). IEEE, 2018.

[7] Manasa, Chadalavada Sai, K. Jeeva Priya, and Deepa Gupta. "Comparison of acoustical models of GMM-HMM based for
speech recognition in Hindi using PocketSphinx." 2019 3rd International Conference on Computing Methodologies and
Communication (ICCMC). IEEE, 2019.

[8] Wei Tang, Wei, et al. "Design and implementation of information and communication dispatching system based on
FreeSwitch platform." Journal of Physics: Conference Series. Vol. 1449. No. 1. IOP Publishing, 2020.

[9] Chunwijitra, Sila, et al. "A cloud-based framework for Thai large vocabulary speech recognition." 2016 13th
International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information
Technology (ECTI-CON). IEEE, 2016.

[10] Chunwijitra, Sila, et al. "Distributing and Sharing Resources for Automatic Speech Recognition Applications." 2019
22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech
Databases and Assessment Techniques (O-COCOSDA). IEEE, 2019.

[11] Boza-Quispe, Gustavo, et al. "A friendly speech user interface based on Google cloud platform to access a tourism
semantic website." 2017 CHILEAN Conference on Electrical, Electronics Engineering, Information and Communication
Technologies (CHILECON). IEEE, 2017.

[12] He, Yanzhang, et al. "Streaming end-to-end speech recognition for mobile devices." ICASSP 2019-2019 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019.

© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6163

View publication stats

You might also like