Designing of a Novel Framework for Marathi Natural Language Processing: MR-LIWC2015

Saroj Date, Sachin N. Deshmukh, Ryan Boyd, Ashwini Ashokkumar, James W. Pennebaker

Research output: Contribution to journalArticlepeer-review


The role of linguistic analysis in understanding human behaviour, emotions, and psychological states has gained significant prominence in various domains, including psychology, social sciences, and computational linguistics. The Linguistic Inquiry and Word Count (LIWC) is a widely used tool, developed by American social psychologist James W. Pennebaker and team of the University of Texas, Austin, enables automated linguistic analysis of text. This analysis provides insights into psychological and emotional dimensions. However, its applicability has been mainly restricted to English and a few other languages, limiting its usage in multilingual contexts. Originally developed in English, it has been adapted to several other languages like German, Dutch, Spanish, Chinese, Turkish, French, etc. However, this tool is not yet available for Marathi language-a major language spoken by people of Maharashtra, India. This paper presents a novel framework for the development and evaluation of a Marathi translation of the LIWC dictionary, aiming to expand its utility to the Marathi speaking population. The development process of Marathi version of LIWC is based on English LIWC-2015. The work is unique since it is the first LIWC translation for any Indian language. The development of Marathi version of LIWC includes several steps like initial translation and wildcard(*) expansion, dictionary expansion, linguistic analysis, wordlist development,cultural adaptation,wordlist validation process, refinement phase, equivalence research, addition of summary variables and wrap-up final dictionary in official LIWC format. The evaluation of the Marathi LIWC is conducted on a diverse dataset of Marathi text samples, encompassing social media posts, speech transcripts, blogs, short stories and book summaries. The performance of the translated dictionary is assessed based on its ability to accurately capture linguistic features, emotional tones, and psychological constructs present in the Marathi lan guage. To evaluate the effectiveness of the Marathi LIWC, a diverse dataset of Marathi texts was analyzed using both the original Engli sh LIWC and the newly developed Marathi LIWC. The results of the evaluation demonstrate that the Marathi LIWC maintains its alignment with the original LIWC's underlying linguistic and psychological dimensions while catering to the specifics of the Marathi language. The translated dictionary exhibited promising reliability and validity in capturing linguistic and psycholo gical features within Marathi texts.

Original languageEnglish (US)
Pages (from-to)1-14
Number of pages14
JournalInternational Journal of Intelligent Systems and Applications in Engineering
Issue number11s
StatePublished - Jan 11 2024


  • English LIWC
  • LIWC
  • Marathi
  • Marathi LIWC
  • Marathi translation
  • NLP
  • Natural language processing
  • Sentiment analysis
  • translation
  • translation procedure
  • translation process

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Information Systems
  • Computer Graphics and Computer-Aided Design
  • Artificial Intelligence


Dive into the research topics of 'Designing of a Novel Framework for Marathi Natural Language Processing: MR-LIWC2015'. Together they form a unique fingerprint.

Cite this