A seasoned Computer Scientist with 15 years of expertise in Natural Language Processing, specializing in Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems.
Natural Language Processing, Machine Learning, Big Data Analytics, Pattern Mining, Computational Linguistics, Semantic Web, Cognitive Computing, Distributed Computing, XML, and Database systems.
Sr. Tech Lead (IC Sr.Manager), Data Science (AI & NLP), Enterprise Data Science, CapitalOne, Cambridge, MA
Led the IR Team for Agent Assist in developing a state-of-the-art RAG system to support thousands of customer support agents across various lines of business.
Enhanced IR systems by integrating cross-encoder re-rankers, hybrid search methodologies, and dynamic configurations. Key optimization strategies included user preference re-ranking, ranked fusion, entropy-based selection of top preferences, and nucleus thresholding for retrieval.
Directed high-performing teams in designing, implementing, and optimizing deep learning models, employing advanced techniques such as model perturbations to improve robustness, generalization, and adversarial resilience.
Managed end-to-end NLP projects, from conceptualization and data acquisition to model training, evaluation, and deployment, ensuring effective utilization of LLMs.
Developed and fine-tuned deep learning models for question answering and text generation, focusing on scalable and efficient NLP system deployment.
Collaborated with cross-functional teams—including researchers, data scientists, software engineers, and business stakeholders—to translate complex business requirements into actionable technical strategies, ensuring successful project execution and impactful results in AI, NLP, and deep learning.
Tech Lead (IC Manager), Data Science (AI & NLP), Enterprise Data Science, CapitalOne, Cambridge, MA
Lead the IR Team for Agent Assist to built the SOTA RAG system for helping thousands Customer Support Agents across several LoBs.
Lead high-performing teams in the design, implementation, and optimization of deep learning models, leveraging advanced techniques such as model perturbations to enhance robustness, generalization, and adversarial resilience.
Built and managed end-to-end NLP projects, from conceptualization and data acquisition to model training, evaluation, and deployment, ensuring the effective utilization of LLMs.
Built several deep learning models in the area of question answering, text generation, with a focus on implementing and fine-tuning LLMs, deploying scalable and efficient NLP systems.
Collaborated with cross-functional teams, including researchers, data scientists, software engineers, and business stakeholders, to translate complex business requirements into actionable technical strategies, drive successful project execution, and deliver impactful results in the areas of AI, NLP and deep learning.
Value Stream Manager and Technology Lead, AI in PMS applications, Philips Research North America, Cambridge, MA
Senior Scientist, Philips Research North America, Cambridge, MA
Scientist, Philips Research North America, Cambridge, MA
Post Doctoral Research Associate at Pacific Northwest National Laboratory, Richland, WA
Document the performance of the QA system with PACS (Technical Note (PR-TN 2018/00629):Intelligent Product Support Assistant: AI-driven Approach for Just-in-time Customer Support (Healthcare Informatics)
Distributed Workflow: Deployable KIE Service (Data Science Platform)
De-Identification Services: De-Id and Linking Pipeline (Data Science Platform)
De-Identification Services: DICOM De-Indentification Service (Data Science Platform)
Knowledge Graph-based Clinical Diagnosis System for DSP (Data Science Platform)
Clinical Knowledge Base Asset for DSP (Data Science Platform)
Condensed memory neural networks for clinical question answering: transfer of model as asset on DSP (Data Science Platform)
ICON Semantic Search Module for Radiology Reports (EI/CHI)
Umashanthi Pavalanathan, Vivek Datla, Svitlana Volkova, Lauren Charles-Smith, Meg Pirrung, Josh Harrison, Alan Chappell, Courtney D Corley (2017). Studying Military Community Health, Well-Being, and Discourse Through the Social Media Lens, 87-105.
Datla, V.V., Lin, King-Ip, & Louwerse, M.M. (2014). Linguistic features predict the truthfulness of short political statements. International Journal of Computational Linguistics and Applications, 5(1), 79-94
Ionescu, B., Müller, H., Péteri, R., Cid, Y.D., Liauchuk, V., Kovalev, V., Klimuk, D., Tarasau, A., Abacha, A.B., Hasan, S.A. and Datla, V., 2019, September. ImageCLEF 2019: Multimedia retrieval in medicine, lifelogging, security and nature. In International Conference of the Cross-Language Evaluation Forum for European Languages (pp. 358-386). Springer, Cham.
Pandey, Rahul, Md Shamsuzzaman, Sadid A. Hasan, Mohammad S. Sorower, Md Abdullah Al Hafiz Khan, Joey Liu, Vivek Datla et al. “BoostER: A Performance Boosting Module for Biomedical Entity Recognition.” In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2554-2560. IEEE, 2019.
Abacha, Asma Ben, Sadid A. Hasan, Vivek V. Datla, Joey Liu, Dina Demner-Fushman, and Henning Müller. “VQA-Med: Overview of the medical visual question answering task at imageclef 2019.” In CLEF2019 Working Notes. CEUR Workshop Proceedings, pp. 09-12. 2019.
Khan, Md Abdullah Al Hafiz, Md Shamsuzzaman, Sadid A. Hasan, Mohammad S. Sorower, Joey Liu, Vivek Datla, Mladen Milosevic, Gabe Mankovich, Rob van Ommering, and Nevenka Dimitrova. “Improving Disease Named Entity Recognition for Clinical Trial Matching.” In 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 2541-2548. IEEE, 2019.
Hasan, Sadid A., Yuan Ling, Joey Liu, Rithesh Sreenivasan, Shreya Anand, Tilak Raj Arora, Vivek Datla et al. “Attention-based medical caption generation with image modality classification and clinical concept mapping.” In International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 224-230. Springer, Cham, 2018.
Adduru, V., Hasan, S. A., Liu, J., Ling, Y., Datla, V., Lee, K., … & Farri, O. (2018). Towards dataset creation and establishing baselines for sentence-level neural clinical paraphrase generation and simplification.IJCAI-ECAI, 2018
Vivek Datla, Tilak Arora, Joey Liu, Viraj Adduru, Sadid A. Hasan, Kathy Lee, Ashequl Qadir, Yuan Ling, Aaditya Prakash and Oladimeji Farri (2017, Oct) Open domain real-time question answering based on asynchronous multiperspective context-driven retrieval and neural paraphrasing. TREC, 2017
Kathy Lee, Ashequl Qadir, Yuan Ling, Joey Liu, Sadid A. Hasan, Vivek Datla, Aaditya Prakash and Oladimeji Farri. Recognizing Tweet Relevance with Profile-specific and Profile-independent Supervised Models. TREC, 2017.
Yuan Ling, Sadid A. Hasan, Joey Liu, Kathy Lee, Vivek Datla, Ashequl Qadir, Oladimeji Farri, Michele Filannino, William Boag, Di Jin, Michele Filannino, Kevin P. Buchan, and Ozlem Uzune . A Hybrid Approach to Precision Medicine-related Biomedical Article Retrieval and Clinical Trial Matching. TREC, 2017
Vivek Datla, Sadid A. Hasan, Ashequl Qadir, Kathy Lee, Yuan Ling, Joey Liu, and Oladimeji Farri, Automated Clinical Diagnosis: The Role of Content in Various Sections of a Clinical Document, BIBM’BHI 2017
Ling, Y., Hasan, S. A., Datla, V., Qadir, A., Lee, K., Liu, J., & Farri, O. Diagnostic Inferencing via Improving Clinical Concept Extraction with Deep Reinforcement Learning: A Preliminary Study.MACHINE LEARNING FOR HEALTHCARE (MLHC) 2017
Lee, K., Qadir, A., Hasan, S. A., Datla, V., Prakash, A., Liu, J., & Farri, O. (2017, April). Adverse Drug Event Detection in Tweets with Semi-Supervised Convolutional Neural Networks. In Proceedings of the 26th International Conference on World Wide Web (pp. 705-714). International World Wide Web Conferences Steering Committee.
Prakash Aaditya, Siyuan Zhao, Sadid A. Hasan, Vivek Datla, Kathy Lee, Ashequl Qadir, Joey Liu, and Oladimeji Farri. “Condensed Memory Networks for Clinical Diagnostic Inferencing.” (2017).
Vivek Datla, Sadid A. Hasan, Joey Liu, Kathy Lee, Ashequl Qadir, Aaditya Prakash, Oladimeji Farri. “Open Domain Real-Time Question Answering Based on Semantic and Syntactic Question Similarity.” TREC, 2016.
Hasan, Sadid A., Siyuan Zhao, Vivek Datla, Joey Liu, Kathy Lee, Ashequl Qadir, Aaditya Prakash, and Oladimeji Farri. “Clinical question answering using key-value memory networks and knowledge graph.” TREC, 2016.
Hasan, Sadid A., Bo Liu, Joey Liu, Ashequl Qadir, Kathy Lee, Vivek Datla, Aaditya Prakash, and Oladimeji Farri. “Neural Clinical Paraphrase Generation with Attention.” ClinicalNLP 2016 (2016): 42.
Aaditya Prakash, Sadid A Hasan, Kathy Lee, Vivek Datla, Ashequl Qadir, Joey Liu, Oladimeji Farri (2016). Neural Paraphrase Generation with Stacked Residual LSTM Networks. Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 2923–2934, Osaka, Japan, December 11-17 2016.
Pavalanathan Umashanthi, Vivek Datla, Svitlana Volkova, Lauren Charles-Smith, Meg Pirrung, Josh Harrison, Alan Chappell, and Courtney D. Corley (2016).Discourse, Health and Well-being of Military Populations Through the Social Media Lens. In proceeding of W3PHI 2016.
Morrison, D. M., Nye, B., Samei, B., Datla, V. V., Kelly, C., & Rus, V. (2014). Building an intelligent pal from the tutor. com session database-phase 1: data mining. In Proceedings of the 7th International Conference on Educational Data Mining (pp. 335-336).
Datla, V.V., Louwerse, M.M., & Lin, King-Ip (2014). Part of Speech Induction from Distributional Features: Balancing Vocabulary and Context. In William Eberle & Chutima Boonthum-Denecke (Eds.), Proceedings of the Twenty-Seventh International Florida Artificial Intelligence Research Society Conference (pp. 28-32).: AAAI Press.
Datla, V., King-Ip Lin, & Louwerse, M. M. Capturing disease-symptom relations using higher-order co-occurrence algorithms, 2012 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
Hutchinson, S., Datla, V., & Louwerse, M. M. Social networks are encoded in language. Proceedings of the 34th Annual Conference of the Cognitive Science Society. Sapporo, Japan: Cognitive Science Society.
Tillman, R., Datla, V., Hutchinson, S., & Louwerse, M. M. From head to toe: Embodiment through statistical linguistic frequencies. Proceedings of the 34th Annual Conference of the Cognitive Science Society. Sapporo, Japan: Cognitive Science Society.
King-Ip Lin, Datla, V., Morrison, L., Louwerse, M., “Using a feedback system to enhance chart note quality in Electronic Health Records,” Bioinformatics and Biomedicine (BIBM), 2011 IEEE International Conference on , Bioinformatics and Biomedicine, vol., no., pp.649-654, 12-15 Nov. 2011
Wu, Q., Datla, V., On performance modeling and prediction in support of scientific workflow optimization. In: Proceedings of the 7th IEEE World Congress on Services. Washington DC (Jul 4-9 2011)
Q. Wu, S. Shiva, S. Roy, C. Ellis, V. Datla, and D. Dasgupta. On Modeling and Simulation of Game Theory-based Defense Mechanisms against DoS and DDoS Attacks. 43rd Annual Simulation Symposium (ANSS10), part of the 2010 Spring Simulation MultiConference, April 11-15, 2010.
Franklin, S., Ramamurthy, U., D’Mello, S., McCauley, L., Negatu, A., Silva R., & Datla, V. (2007). LIDA: A computational model of global workspace theory and developmental learning. In AAAI Fall Symposium on AI and Consciousness: Theoretical Foundations and Current Approaches. Arlington, VA: AAAI.
Vivek Datla, Sadid A. Hasan, Joey Liu, Kathy Lee, Ashequl Qadir, Aaditya Prakash, Oladimeji Farri. “Open Domain Real-Time Question Answering Based on Semantic and Syntactic Question Similarity.” TREC, 2016.
Datla, V.V., Lin, King-Ip, & Louwerse, M.M. (2014). Linguistic features predict the truthfulness of short political statements, CICLING 2014
Vivek Datla, King-Ip Lin , M.M Louwerse (2013) Language encodes verifiability of statements, University of Memphis, Research day
Louwerse, M.M., Baskar, L., Datla, V., Lin, K., Morrison, L. (2011). Linguistic features in medical chart notes: How language features benefit our health. Paper presented at the 21th Annual Meeting of the Society for Text and Discourse. Poitier, France.
Datla, V.V., Ellis, C., Roy, S. and Sajjan, S (2011) Game Theory-based Defense Mechanisms against DoS and DDoS Attacks.
Datla, V., & Vishnu, A. (2015). Predicting the top and bottom ranks of billboard songs using Machine Learning. arXiv preprint arXiv:1512.01283.
Datla, V., Lin, D., Louwerse, M., & Vishnu, A. (2016). A Data-Driven Approach for Semantic Role Labeling from Induced Grammar Structures in Language. arXiv preprint arXiv:1606.06274.