Vivek V Datla

Logo

A seasoned Computer Scientist with 15 years of expertise in Natural Language Processing, specializing in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems.

View My GitHub Profile

Education

2009-2014
Ph.D., Computer Science; University of Memphis (Tennessee, USA)
2006-2008
M.S., Computer Science; University of Memphis (Tennessee, USA)
2001-2005
B.Tech, Electronics and Communication Engg; JNTU (Telangana, India)

Career Highlights

Expertise

Natural Language Processing, Machine Learning, Big Data Analytics, Pattern Mining, Computational Linguistics, Semantic Web, Cognitive Computing, Distributed Computing, XML, and Database systems.

Technologies

Granted Patents

62/793,611
A system for multi-perspective discourse within a set conversation standards
62/411,947
MUDRA: Multi-Domain Real-Time Question Answering System
16/430,676
Neural Text Simplification by Jointly Learning Semantic Alignment and Simplicity
16/430,788
Open-domain real-time question answering
62/401,293
Systems and Methods for Question Generation with Fact-based Attentive Recurrent Neural Networks
62/454089
Systems and Methods to Optimize Clinical Decision Support with Deep Reinforcement Learning
16/329,959
Semi-supervised classification with stacked autoencoder
16/330,174
Systems and Methods for Diagnostic Inferencing with Multimodal Deep Memory Networks
15/707,550
Condensed Memory Networks
62/406,427
Patient-centric Clinical Knowledge Discovery System using Deep Learning, NLP and Voice Services
16/334,135
Systems and Methods for Question Generation with Fact-based Attentive Recurrent Neural Networks
16/491,489
Drawing conclusions from free form texts with deep reinforcement learning
62/772,764
CRF-based Span Prediction for Fine Machine Reading Comprehension
16/979,199
Haptic input text generation

Professional Experience

2024-Present

Sr. Tech Lead (IC Sr.Manager), Data Science (AI & NLP), Enterprise Data Science, CapitalOne, Cambridge, MA

2022-2024

Tech Lead (IC Manager), Data Science (AI & NLP), Enterprise Data Science, CapitalOne, Cambridge, MA

2021-2022

Value Stream Manager and Technology Lead, AI in PMS applications, Philips Research North America, Cambridge, MA

2017-2021

Senior Scientist, Philips Research North America, Cambridge, MA

2015-2017

Scientist, Philips Research North America, Cambridge, MA

2014-2015

Post Doctoral Research Associate at Pacific Northwest National Laboratory, Richland, WA

2013-2013
Data Analyst Intern (ASTRO program) at Oakridge National Laboratory, Oakridge, TN
2008-2009
Software Engineer, Verified Person Inc., Memphis, TN

Academic Research Experience

2011-2014
Graduate Researcher, Multimodal Aspects of Discourse Research Lab, FedEx Inst. Of Tech., Memphis TN
2013-2014
Student Researcher, IIS. FedEx Inst. Of Tech., Memphis TN.
2010-2011
High Perf. Comp. and Networking Lab, Univ. of Memphis TN.
2009-2010
Game Theory for Comp. Security Lab., Univ. of Memphis TN.
2006-2008
Cognitve Computing Research Group, FedEx Inst. Of Tech., Memphis TN.

Awards & Honors

Filed Patents

62/891787
System for Automated Dynamic Guidance for DIY Projects
62/869075
Multi-Pass Fine Reading for Machine Comprehension
62/777,278
Systems and methods for augmented reality enhanced field services support
62/681123
Open domain real-time question answering based on asynchronous multi perspective context driven retrieval and neural paraphrasing
62/680660
Neural Text Simplification by Jointly Learning Semantic Alignment and Simplicity
62/551496
Recognizing Emotions in Social Media with Guided Co-training
62/531,147
COMPANION: An Ever Learning Intelligent System for Improved Quality of Life
62/484,602
DBrain: A System to Infer Diagnoses from Clinical Notes with Deep Reinforcement Learning
62/454085
An Ensemble-based Iterative Classification Framework for Recognizing Emotion in Text
62/415,541
Classification of Cognitive Bias in Microblogs relative to Healthcare-centric Evidence
62/412,329
Knowledge Graph-based Clinical Diagnosis Assistant
62/411,907
Meeting User Information Needs with Personalized Monitoring of the Real-Time Streaming Data
62/384,250
A Deep Learning-based Semi-Supervised Approach for Text Classification
62/384,235
Systems and Methods for Diagnostic Inferencing with Multimodal Deep Memory Networks
62/377,778
Knowledge Discovery from Social Media and Biomedical Literature for Adverse Drug Events

Invention Disclosures

AI and Machine Learning for Healthcare

2021ID00749
AI Driven complaint mapper to improve Philips Labeling and Internal Documentation (SRAs)
2020ID02075
An approach to generate partially clinically relevant synthetic electronic health records
2020ID01445
Concept mapping using joint classification with natural language processing and distribution models of clinical feature values
2018ID00107
A method for identifying abnormal neurological development from MRI images for the neonatal patients
2017ID05628
Enhanced workflow management system for medical diagnosis based on phenotyping deltas
2017ID05578
A system for modelling patient conditions using markov logic network
2017ID03450
DBrain A System to Infer Diagnoses from Clinical Notes with Deep Reinforcement Learning
2016ID02000
Condensed Memory Networks for Diagnostic Inferencing from Free Text Clinical Notes
2016ID01819
Knowledge Graph-based Clinical Diagnosis Assistant
2016ID01736
Systems and Methods for Diagnostic Inferencing with Multimodal Deep Memory Networks
2016ID00331
Patient-centric Clinical Knowledge Discovery System using Deep Learning, NLP and Voice Services

Natural Language Processing and Text Analysis

2020ID02067
A Method for Assessing Sentence Importance in Text Classification
2020ID01789
A framework and method for identifying relevant phrases about medical devices issues from a long text
2020ID01708
Free Text Concept Classification with Domain Invariance
2019ID01837
Improved evaluation metric for table to text conversion
2019ID01252
Improved coverage for table to text generation
2019ID01139
Improving the performance of disease NER for Clinical Trial Matching
2018ID01346
Multi-Pass Fine Reading for Machine Comprehension
2018ID01496
Multi-Pass Span Prediction for Fine Machine Reading Comprehension
2018ID02179
CRF-based Span Prediction for Fine Machine Reading Comprehension
2017ID03544
A Method for Automatically Constructing a Dictionary of Figurative Description of Illness
2017ID03449
Neural Text Simplification by Jointly Learning Semantic Alignment and Simplicity
2016ID02347
Idea Density-enhanced Named Entity Recognition to Detect Cognitive Impairment in the Elderly

Question Answering and Information Retrieval Systems

2019ID02377
Automatic PDF document digestion into a live QA system
2019ID02314
A system for personalized language-agnostic document retrieval
2018ID00556
System and method for personalized physiology-aware question answering
2018ID02354
Novel Retrieval Architecture for Treatment-Related Biomedical Articles and Clinical Trials
2017ID05255
Open domain real-time question answering based on asynchronous multi perspective context driven retrieval and neural paraphrasing
2016ID02137
MUDRA: Multi Domain Real-Time Question Answering System
2016ID01150
A Method to use Neural Semantic Similarity in Ranking Answers to Live Questions

Medical Device and Product Support

2020ID02127
Power Monitoring for Medical Devices Failure Prediction and Identification
2020ID02125
AI framework for detecting data completeness to improve field service management and complaint handling workflow
2020ID01927
Semantic Mapping of Errors, Logs and Resolution through unified joint representation
2020ID01781
Investigation Difficulty Assessment of Product Complaints with Language Models of Heterogenous Domain Corpora
2020ID01780
A Framework for Automatic Identification of Recurring Product Quality Issues from Customer and Service Engineer-Reported Free Text Data
2020ID01312
System and methods for collecting error log information from medical devices in product support lifecycle
2019ID01002
A System for Automatically Identifying the State and Errors for box devices

Augmented Reality and Interactive Systems

2018ID00792
Systems and methods for augmented reality enhanced field services support
2018ID01276
System for Automated Dynamic Guidance for DIY Projects
2016ID02434
Addressing Cognitive Impairment in the Elderly using Dialogue Systems and Augmented Reality

Emotion and Sentiment Analysis

2017ID04760
Recognizing Emotions in Social Media with Guided Co-training
2017ID03040
An Ensemble-based Iterative Classification Framework for Recognizing Emotion in Text

Data Management and Standardization

2020ID01444
Standardized Reporting Tool for Hospital Data
2019ID00954
Tool and Framework for the Curation of Clinical Trials and Records from Unstructured Texts

Machine Learning and AI Improvements

2020ID00002
A Semi-supervised Framework for Modeling Classification Errors
2019ID02424
Iterative instance selection to reduce annotation errors associated with multilabel instances
2016ID01750
A Deep Learning-based Semi-Supervised Approach for Text Classification

Personalized and Context-Aware Systems

2018ID02676
A System for situational awareness using context driven embeddings
2016ID01988
COMPANION An Ever Learning Intelligent System for Improved Quality of Life
2016ID01931
Meeting User Information Needs with Personalized Monitoring of the Real-Time Streaming Data

Others

2020ID01212
Language-Agnostic Code Recommendation without Translation
2019ID02273
Grounding clinical notes with numerical data to enhance clinical decision support
2019ID00990
An interactive annotation interface for human-in-the-loop information retrieval and extraction
2018ID00555
System and methods for contextual symptom capturing based on physiological sensing
2018ID00611
A system for multi-perspective discourse within a set conversation standards
2018ID01400
AI-Enabled Interruption Handling Intelligent Agent
2017ID03653
Touch-to-Text Text Generation based on Haptic Signals from Clinical Palpation
2017ID03081
Systems and Methods to Optimize Clinical Decision Support with Deep Reinforcement Learning
2016ID01213
MEDFLIX Interactive Video-based Summarization of Electronic Medical Records
2016ID00529
Knowledge Discovery from Social Media and Biomedical Literature for Adverse Drug Events
2016ID00332
Classification of Cognitive Bias in Microblogs relative to Healthcare-centric Evidence

Technology Transfers to Business

Publications

Journals

Public Health Intelligence and Internet

Umashanthi Pavalanathan, Vivek Datla, Svitlana Volkova, Lauren Charles-Smith, Meg Pirrung, Josh Harrison, Alan Chappell, Courtney D Corley (2017). Studying Military Community Health, Well-Being, and Discourse Through the Social Media Lens, 87-105.

IJCLA

Datla, V.V., Lin, King-Ip, & Louwerse, M.M. (2014). Linguistic features predict the truthfulness of short political statements. International Journal of Computational Linguistics and Applications, 5(1), 79-94

Conferences and Workshops

CLEF’19

Ionescu, B., Müller, H., Péteri, R., Cid, Y.D., Liauchuk, V., Kovalev, V., Klimuk, D., Tarasau, A., Abacha, A.B., Hasan, S.A. and Datla, V., 2019, September. ImageCLEF 2019: Multimedia retrieval in medicine, lifelogging, security and nature. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 358-386). Springer, Cham.

BIBM’19

Pandey, Rahul, Md Shamsuzzaman, Sadid A. Hasan, Mohammad S. Sorower, Md Abdullah Al Hafiz Khan, Joey Liu, Vivek Datla et al. “BoostER: A Performance Boosting Module for Biomedical Entity Recognition.” In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2554-2560. IEEE, 2019.

CLEF’19

Abacha, Asma Ben, Sadid A. Hasan, Vivek V. Datla, Joey Liu, Dina Demner-Fushman, and Henning Müller. “VQA-Med: Overview of the medical visual question answering task at imageclef 2019.” In CLEF2019 Working Notes. CEUR Workshop Proceedings, pp. 09-12. 2019.

BIBM’19

Khan, Md Abdullah Al Hafiz, Md Shamsuzzaman, Sadid A. Hasan, Mohammad S. Sorower, Joey Liu, Vivek Datla, Mladen Milosevic, Gabe Mankovich, Rob van Ommering, and Nevenka Dimitrova. “Improving Disease Named Entity Recognition for Clinical Trial Matching.” In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2541-2548. IEEE, 2019.

ICCL’18

Hasan, Sadid A., Yuan Ling, Joey Liu, Rithesh Sreenivasan, Shreya Anand, Tilak Raj Arora, Vivek Datla et al. “Attention-based medical caption generation with image modality classification and clinical concept mapping.” In International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 224-230. Springer, Cham, 2018.

NAACL’18
Ghaeini, Reza, Sadid A. Hasan, Vivek Datla, Joey Liu, Kathy Lee, Ashequl Qadir, Yuan Ling, Aaditya Prakash, Xiaoli Fern, and Oladimeji Farri. “DR-BiLSTM: Dependent Reading Bidirectional LSTM for Natural Language Inference.” In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1460-1469. 2018.
IJCAI-ECAI’18

Adduru, V., Hasan, S. A., Liu, J., Ling, Y., Datla, V., Lee, K., … & Farri, O. (2018). Towards dataset creation and establishing baselines for sentence-level neural clinical paraphrase generation and simplification.IJCAI-ECAI, 2018

TREC’17

Vivek Datla, Tilak Arora, Joey Liu, Viraj Adduru, Sadid A. Hasan, Kathy Lee, Ashequl Qadir, Yuan Ling, Aaditya Prakash and Oladimeji Farri (2017, Oct) Open domain real-time question answering based on asynchronous multiperspective context-driven retrieval and neural paraphrasing. TREC, 2017

TREC’17

Kathy Lee, Ashequl Qadir, Yuan Ling, Joey Liu, Sadid A. Hasan, Vivek Datla, Aaditya Prakash and Oladimeji Farri. Recognizing Tweet Relevance with Profile-specific and Profile-independent Supervised Models. TREC, 2017.

TREC’17

Yuan Ling, Sadid A. Hasan, Joey Liu, Kathy Lee, Vivek Datla, Ashequl Qadir, Oladimeji Farri, Michele Filannino, William Boag, Di Jin, Michele Filannino, Kevin P. Buchan, and Ozlem Uzune . A Hybrid Approach to Precision Medicine-related Biomedical Article Retrieval and Clinical Trial Matching. TREC, 2017

BHI’17

Vivek Datla, Sadid A. Hasan, Ashequl Qadir, Kathy Lee, Yuan Ling, Joey Liu, and Oladimeji Farri, Automated Clinical Diagnosis: The Role of Content in Various Sections of a Clinical Document, BIBM’BHI 2017

MLHC’17

Ling, Y., Hasan, S. A., Datla, V., Qadir, A., Lee, K., Liu, J., & Farri, O. Diagnostic Inferencing via Improving Clinical Concept Extraction with Deep Reinforcement Learning: A Preliminary Study.MACHINE LEARNING FOR HEALTHCARE (MLHC) 2017

WWW’17

Lee, K., Qadir, A., Hasan, S. A., Datla, V., Prakash, A., Liu, J., & Farri, O. (2017, April). Adverse Drug Event Detection in Tweets with Semi-Supervised Convolutional Neural Networks. In Proceedings of the 26th International Conference on World Wide Web (pp. 705-714). International World Wide Web Conferences Steering Committee.

AAAI’17

Prakash Aaditya, Siyuan Zhao, Sadid A. Hasan, Vivek Datla, Kathy Lee, Ashequl Qadir, Joey Liu, and Oladimeji Farri. “Condensed Memory Networks for Clinical Diagnostic Inferencing.” (2017).

TREC’16

Vivek Datla, Sadid A. Hasan, Joey Liu, Kathy Lee, Ashequl Qadir, Aaditya Prakash, Oladimeji Farri. “Open Domain Real-Time Question Answering Based on Semantic and Syntactic Question Similarity.” TREC, 2016.

TREC’16

Hasan, Sadid A., Siyuan Zhao, Vivek Datla, Joey Liu, Kathy Lee, Ashequl Qadir, Aaditya Prakash, and Oladimeji Farri. “Clinical question answering using key-value memory networks and knowledge graph.” TREC, 2016.

COLING ClinicalNLP’16

Hasan, Sadid A., Bo Liu, Joey Liu, Ashequl Qadir, Kathy Lee, Vivek Datla, Aaditya Prakash, and Oladimeji Farri. “Neural Clinical Paraphrase Generation with Attention.” ClinicalNLP 2016 (2016): 42.

COLING’16

Aaditya Prakash, Sadid A Hasan, Kathy Lee, Vivek Datla, Ashequl Qadir, Joey Liu, Oladimeji Farri (2016). Neural Paraphrase Generation with Stacked Residual LSTM Networks. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2923–2934, Osaka, Japan, December 11-17 2016.

AAAI W3PHI’16

Pavalanathan Umashanthi, Vivek Datla, Svitlana Volkova, Lauren Charles-Smith, Meg Pirrung, Josh Harrison, Alan Chappell, and Courtney D. Corley (2016).Discourse, Health and Well-being of Military Populations Through the Social Media Lens. In proceeding of W3PHI 2016.

EDM’14

Morrison, D. M., Nye, B., Samei, B., Datla, V. V., Kelly, C., & Rus, V. (2014). Building an intelligent pal from the tutor. com session database-phase 1: data mining. In Proceedings of the 7th International Conference on Educational Data Mining (pp. 335-336).

FLAIRS’14

Datla, V.V., Louwerse, M.M., & Lin, King-Ip (2014). Part of Speech Induction from Distributional Features: Balancing Vocabulary and Context. In William Eberle & Chutima Boonthum-Denecke (Eds.), Proceedings of the Twenty-Seventh International Florida Artificial Intelligence Research Society Conference (pp. 28-32).: AAAI Press.

BIBMW’13

Datla, V., King-Ip Lin, & Louwerse, M. M. Capturing disease-symptom relations using higher-order co-occurrence algorithms, 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

COGSCI’13

Hutchinson, S., Datla, V., & Louwerse, M. M. Social networks are encoded in language. Proceedings of the 34th Annual Conference of the Cognitive Science Society. Sapporo, Japan: Cognitive Science Society.

COGSCI’13

Tillman, R., Datla, V., Hutchinson, S., & Louwerse, M. M. From head to toe: Embodiment through statistical linguistic frequencies. Proceedings of the 34th Annual Conference of the Cognitive Science Society. Sapporo, Japan: Cognitive Science Society.

BIBMW’11

King-Ip Lin, Datla, V., Morrison, L., Louwerse, M., “Using a feedback system to enhance chart note quality in Electronic Health Records,” Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on , Bioinformatics and Biomedicine, vol., no., pp.649-654, 12-15 Nov. 2011

SERVICES’11

Wu, Q., Datla, V., On performance modeling and prediction in support of scientific workflow optimization. In: Proceedings of the 7th IEEE World Congress on Services. Washington DC (Jul 4-9 2011)

ANSS’2010

Q. Wu, S. Shiva, S. Roy, C. Ellis, V. Datla, and D. Dasgupta. On Modeling and Simulation of Game Theory-based Defense Mechanisms against DoS and DDoS Attacks. 43rd Annual Simulation Symposium (ANSS10), part of the 2010 Spring Simulation MultiConference, April 11-15, 2010.

AAAI’07

Franklin, S., Ramamurthy, U., D’Mello, S., McCauley, L., Negatu, A., Silva R., & Datla, V. (2007). LIDA: A computational model of global workspace theory and developmental learning. In AAAI Fall Symposium on AI and Consciousness: Theoretical Foundations and Current Approaches. Arlington, VA: AAAI.

Poster Presentations

TREC’16

Vivek Datla, Sadid A. Hasan, Joey Liu, Kathy Lee, Ashequl Qadir, Aaditya Prakash, Oladimeji Farri. “Open Domain Real-Time Question Answering Based on Semantic and Syntactic Question Similarity.” TREC, 2016.

CICLING’14

Datla, V.V., Lin, King-Ip, & Louwerse, M.M. (2014). Linguistic features predict the truthfulness of short political statements, CICLING 2014

UoM’13

Vivek Datla, King-Ip Lin , M.M Louwerse (2013) Language encodes verifiability of statements, University of Memphis, Research day

ST&D’11

Louwerse, M.M., Baskar, L., Datla, V., Lin, K., Morrison, L. (2011). Linguistic features in medical chart notes: How language features benefit our health. Paper presented at the 21th Annual Meeting of the Society for Text and Discourse. Poitier, France.

UoM’11

Datla, V.V., Ellis, C., Roy, S. and Sajjan, S (2011) Game Theory-based Defense Mechanisms against DoS and DDoS Attacks.

Pre-print

Invited talks and Presentations

Professional Activities