Publications

Augmenting NLP models using Latent Feature Interpolations

Published in COLING, 2020

Models with a large number of parameters are prone to over-fitting and often fail to capture the underlying input distribution. We introduce Emix, a data augmentation method that uses interpolations of word embeddings and hidden layer representations to construct virtual examples. We show that Emix shows significant improvements over previously used interpolation based regularizers and data augmentation techniques. We also demonstrate how our proposed method is more robust to sparsification. We highlight the merits of our proposed methodology by performing thorough quantitative and qualitative assessments.

Recommended citation: Jindal, Amit, Arijit Ghosh Chowdhury, Aniket Didolkar, Di Jin, Ramit Sawhney, and Rajiv Shah. "Augmenting NLP models using Latent Feature Interpolations." In Proceedings of the 28th International Conference on Computational Linguistics, pp. 6931-6936. 2020. https://www.aclweb.org/anthology/2020.coling-main.611.pdf

SpeechMix-Augmenting Deep Sound Recognition using Hidden Space Interpolations

Published in INTERSPEECH, 2020

This paper presents SpeechMix, a regularization and data augmentation technique for deep sound recognition. Our strategy is to create virtual training samples by interpolating speech samples in hidden space. SpeechMix has the potential to generate an infinite number of new augmented speech samples since the combination of speech samples is continuous. Thus, it allows downstream models to avoid overfitting drastically. Unlike other mixing strategies that only work on the input space, we apply our method on the intermediate layers to capture a broader representation of the feature space. Through an extensive quantitative evaluation, we demonstrate the effectiveness of SpeechMix in comparison to standard learning regimes and previously applied mixing strategies. Furthermore, we highlight how different hidden layers contribute to the improvements in classification using an ablation study.

Recommended citation: Jindal, Amit, Narayanan Elavathur Ranganatha, Aniket Didolkar, Arijit Ghosh Chowdhury, Di Jin, Ramit Sawhney, and Rajiv Ratn Shah. "SpeechMix-Augmenting Deep Sound Recognition using Hidden Space Interpolations." Proc. Interspeech 2020 (2020): 861-865. http://www.interspeech2020.org/uploadfile/pdf/Mon-2-8-10.pdf

Arhnet-leveraging community interaction for detection of religious hate speech in arabic

Published in ACL - Student Research Workshop, 2019

The rapid widespread of social media has lead to some undesirable consequences like the rapid increase of hateful content and offensive language. Religious Hate Speech, in particular, often leads to unrest and sometimes aggravates to violence against people on the basis of their religious affiliations. The richness of the Arabic morphology and the limited available resources makes this task especially challenging. The current state-of-the-art approaches to detect hate speech in Arabic rely entirely on textual (lexical and semantic) cues. Our proposed methodology contends that leveraging Community-Interaction can better help us profile hate speech content on social media. Our proposed ARHNet (Arabic Religious Hate Speech Net) model incorporates both Arabic Word Embeddings and Social Network Graphs for the detection of religious hate speech.

Recommended citation: Chowdhury, Arijit Ghosh, Aniket Didolkar, Ramit Sawhney, and Rajiv Shah. "Arhnet-leveraging community interaction for detection of religious hate speech in arabic." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop, pp. 273-280. 2019. https://www.aclweb.org/anthology/P19-2038.pdf

YouToo? Detection of personal recollections of sexual harassment on social media

Published in ACL, 2019

The availability of large-scale online social data, coupled with computational methods can help us answer fundamental questions relat-ing to our social lives, particularly our health and well-being. The# MeToo trend has led to people talking about personal experiences of harassment more openly. This work at-tempts to aggregate such experiences of sex-ual abuse to facilitate a better understanding of social media constructs and to bring about social change. It has been found that disclo-sure of abuse has positive psychological im-pacts. Hence, we contend that such informa-tion can leveraged to create better campaigns for social change by analyzing how users react to these stories and to obtain a better insight into the consequences of sexual abuse. We use a three part Twitter-Specific Social Media Lan-guage Model to segregate personal recollec-tions of sexual harassment from Twitter posts. An extensive comparison with state-of-the-art generic and specific models along with a de-tailed error analysis explores the merit of our proposed model.

Recommended citation: Chowdhury, Arijit Ghosh, Ramit Sawhney, Rajiv Shah, and Debanjan Mahata. "# YouToo? detection of personal recollections of sexual harassment on social media." In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 2527-2537. 2019. https://www.aclweb.org/anthology/P19-1241.pdf