Academic Research

Here are some highlights of my academic interests. See the complete list of my publications below or on Google Scholar.

Google Scholar citation count

Computational Argumentation and Debating

Research in computational argumentation focuses on understanding and generating arguments, with applications in debating systems and listening comprehension.

Selected Publications

Personalized Machine Translation

We proposed personalizing machine translation systems in order to adapt automatic translation systems from 2 different perspectives:

  • For Individual users' preferences
  • For preserving authors' styles through translation

Selected Publications

Model-aware Improvement of Source Translatability

Research in improving source translatability focuses on modifying source texts to enhance translation quality.

Selected Publications

Academic Service

Program Committee Member

W-NUT 2026 (expected) //ACL 2024 (Area Chair) // W-NUT 2022 // ARR April 2022 / ACL 2022 (ARR) // W-NUT 2021 // EMNLP 2021 // EACL 2021 // COLING 2020 // *SEM 2020 // EMNLP 2020 // ACL 2020 // LREC 2020 // W-NUT 2019 // ACL 2019 // NLP+CSS 2019 // COLING 2018 // ACL 2018 // NAACL 2018 // EMNLP 2017 // *SEM 2017 // ACL 2017 // Journal of Natural Language Engineering (JNLE) 2016 // COLING 2016 // LREC 2016 // EMNLP 2016 // *SEM 2016 // EMNLP 2015 // *SEM 2015 // CICLING 2015 // Journal of Language Resources and Evaluation (LREV) 2014 //EMNLP 2014 // COLING 2014 // WMT 2014 // LREC 2014 // WMT 2013 // Journal of Language Resources and Evaluation (LREV) 2013 // IJCNLP 2013 // *SEM 2013 // Journal of Computer Science and Technology (JCST) 2013 // WMT 2012 // EACL 2012 // LREC 2012 // ACM TIST Journal, Special Issue on Paraphrasing 2011 // EMNLP 2011 // TextInfer 2011 // COLING 2010 // EMNLP 2009 // AAAI 2008

All Publications

2026

Universal NER v2: Towards a Massively Multilingual Named Entity Recognition Benchmark

Blevins, Terra; Mayhew, Stephen; Suppa, Marek; Gonen, Hila; Mirkin, Shachar; Pais, Vasile; Dobrovoljc, Kaja; Giouli, Voula; Kevin, Jun; Yılandiloğlu, Enes; Jang, Eugene; Kim, Eungseo; Seo, Jeongyeon; Gialis, Xenophon; Pinter, Yuval

LREC 2026

While multilingual language models promise to bring the benefits of LLMs to speakers of many languages, gold-standard evaluation benchmarks in most languages to interrogate these assumptions remain scarce. The Universal NER project, now entering its fourth year, is dedicated to building gold-standard multilingual Named Entity Recognition (NER) benchmark datasets. Inspired by existing massively multilingual efforts for other core NLP tasks (e.g., Universal Dependencies), the project uses a general tagset and thorough annotation guidelines to collect standardized, cross-lingual annotations of named entity spans. The first installment (UNER v1) was released in 2024, and the project has continued and expanded since then, with various organizers, annotators, and collaborators in an active community.

2025

All languages matter: Evaluating lmms on culturally diverse 100 languages

Vayani, Ashmal; Dissanayake, Dinura; Watawana, Hasindri; Ahsan, Noor; Sasikumar, Nevasini; Thawakar, Omkar; Ademtew, Henok Biadglign; Hmaiti, Yahya; Kumar, Amandeep; Kuckreja, Kartik; Maslych, Mykola; Al Ghallabi, Wafa; Mihaylov, Mihail Minkov; Qin, Chao; Shaker, Abdelrahman M; Zhang, Mike; Ihsani, Mahardika Krisna; Esplana, Amiel Gian; Gokani, Monil; Mirkin, Shachar; Singh, Harsh; Srivastava, Ashay; Hamerlik, Endre; Izzati, Fathinah Asma; Maani, Fadillah Adamsyah; Cavada, Sebastian; Chim, Jenny; Gupta, Rohit; Manjunath, Sanjay; Zhumakhanova, Kamila; Rabevohitra, Feno Heriniaina; Amirudin, Azril Hafizi; Ridzuan, Muhammad; Abdul Kareem, Daniya Najiha; More, Ketan Pravin; Li, Kunyang; Shakya, Pramesh; Saad, Muhammad; Ghasemaghaei, Amirpouya; Djanibekov, Amirbek; Azizov, Dilshod; Jankovic, Branislava; Bhatia, Naman; Cabrera, Alvaro; Obando-Ceron, Johan; Otieno, Olympiah; Farestam, Febian; Rabbani, Muztoba; Ballah, Sanoojan; Sanjeev, Santosh; Shtanchaev, Abduragim; Fatima, Maheen; Nguyen, Thao; Kareem, Amrin; Aremu, Toluwani; Xavier, Nathan Augusto Zacarias; Bhatkal, Amit; Toyin, Hawau Olamide; Chadha, Aman; Cholakkal, Hisham; Anwer, Rao Muhammad; Felsberg, Michael; Laaksonen, Jorma; Solorio, Thamar; Choudhury, Monojit; Laptev, Ivan; Shah, Mubarak; Khan, Salman; Khan, Fahad Shahbaz

Proceedings of the Computer Vision and Pattern Recognition Conference , pp. 19565-19575

Existing Large Multimodal Models (LMMs) generally focus on only a few regions and languages. As LMMs continue to improve, it is increasingly important to ensure they understand cultural contexts, respect local sensitivities, and support low-resource languages, all while effectively integrating corresponding visual cues. In pursuit of culturally diverse global multimodal models, our proposed All Languages Matter Benchmark (ALM-bench) represents the largest and most comprehensive effort to date for evaluating LMMs across 100 languages. ALM-bench challenges existing models by testing their ability to understand and reason about culturally diverse images paired with text in various languages, including many low-resource languages traditionally underrepresented in LMM research. The benchmark offers a robust and nuanced evaluation framework featuring various question formats, including true/false, multiple choice, and open-ended questions, which are further divided into short and long-answer categories. ALM-bench design ensures a comprehensive assessment of a model’s ability to handle varied levels of difficulty in visual and linguistic reasoning. To capture the rich tapestry of global cultures, ALM-bench carefully curates content from 13 distinct cultural aspects, ranging from traditions and rituals to famous personalities and celebrations. Through this, ALM-bench not only provides a rigorous testing ground for state-of-the-art open and closed-source LMMs but also highlights the importance of cultural and linguistic inclusivity, encouraging the development of models that can serve diverse global populations effectively. Our benchmark is publicly available at https://mbzuai-oryx.github.io/ALM-Bench/.

2022

Bloom: A 176b-parameter open-access multilingual language model

Scao, Teven Le; Fan, Angela; Akiki, Christopher; Pavlick, Ellie; Ilić, Suzana; Hesslow, Daniel; Castagné, Roman; Luccioni, Alexandra Sasha; Yvon, François; Gallé, Matthias

arXiv preprint arXiv:2211.05100

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

Emergent Structures and Training Dynamics in Large Language Models

Teehan, Ryan; Clinciu, Miruna; Serikov, Oleg; Szczechla, Eliza; Seelam, Natasha; Mirkin, Shachar; Gokaslan, Aaron

Challenges & Perspectives in Creating Large Language Models

Ryan Teehan, Miruna Clinciu, Oleg Serikov, Eliza Szczechla, Natasha Seelam, Shachar Mirkin, Aaron Gokaslan. Proceedings of BigScience Episode #5 -- Workshop on Challenges & Perspectives in Creating Large Language Models. 2022.

2021

An autonomous debating system

Slonim, Noam; Bilu, Yonatan; Alzate, Carlos; Bar-Haim, Roy; Bogin, Ben; Bonin, Francesca; Choshen, Leshem; Cohen-Karlik, Edo; Dankin, Lena; Edelstein, Lilach

Nature , Volume 591 (7850) , pp. 379-384

Not provided in source

2019

A Dataset of General-Purpose Rebuttal

Orbach, Matan; Bilu, Yonatan; Gera, Ariel; Kantor, Yoav; Dankin, Lena; Lavee, Tamar; Kotlerman, Lili; Mirkin, Shachar; Jacovi, Michal; Aharonov, Ranit

Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Matan Orbach, Yonatan Bilu, Ariel Gera, Yoav Kantor, Lena Dankin, Tamar Lavee, Lili Kotlerman, Shachar Mirkin, Michal Jacovi, Ranit Aharonov, Noam Slonim. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 2019.

Towards effective rebuttal: Listening comprehension using corpus-wide claim mining

Lavee, Tamar; Orbach, Matan; Kotlerman, Lili; Kantor, Yoav; Gretz, Shai; Dankin, Lena; Mirkin, Shachar; Jacovi, Michal; Bilu, Yonatan; Aharonov, Ranit

Proceedings of the 6th Workshop on Argument Mining , pp. 58-66

Engaging in a live debate requires, among other things, the ability to effectively rebut arguments claimed by your opponent. In particular, this requires identifying these arguments. Here, we suggest doing so by automatically mining claims from a corpus of news articles containing billions of sentences, and searching for them in a given speech. This raises the question of whether such claims indeed correspond to those made in spoken speeches. To this end, we collected a large dataset of $400$ speeches in English discussing $200$ controversial topics, mined claims for each topic, and asked annotators to identify the mined claims mentioned in each speech. Results show that in the vast majority of speeches debaters indeed make use of such claims. In addition, we present several baselines for the automatic detection of mined claims in speeches, forming the basis for future work. All collected data is freely available for research.

2018

Listening comprehension over argumentative content

Mirkin, Shachar; Moshkowich, Guy; Orbach, Matan; Kotlerman, Lili; Kantor, Yoav; Lavee, Tamar; Jacovi, Michal; Bilu, Yonatan; Aharonov, Ranit; Slonim, Noam

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing , pp. 719-724

Shachar Mirkin, Guy Moshkowich, Matan Orbach, Lili Kotlerman, Yoav Kantor, Tamar Lavee, Michal Jacovi, Yonatan Bilu, Ranit Aharonov, Noam Slonim. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.

What did you mention? a large scale mention detection benchmark for spoken and written text

Mass, Yosi; Kotlerman, Lili; Mirkin, Shachar; Venezian, Elad; Witzling, Gera; Slonim, Noam

arXiv preprint arXiv:1801.07507

We describe a large, high-quality benchmark for the evaluation of Mention Detection tools. The benchmark contains annotations of both named entities as well as other types of entities, annotated on different types of text, ranging from clean text taken from Wikipedia, to noisy spoken data. The benchmark was built through a highly controlled crowd sourcing process to ensure its quality. We describe the benchmark, the process and the guidelines that were used to build it. We then demonstrate the results of a state-of-the-art system running on that benchmark.

A recorded debating dataset

Mirkin, Shachar; Jacovi, Michal; Lavee, Tamar; Kuo, Hong-Kwang; Thomas, Samuel; Sager, Leslie; Kotlerman, Lili; Venezian, Elad; Slonim, Noam

Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

This paper describes an English audio and textual dataset of debating speeches, a unique resource for the growing research field of computational argumentation and debating technologies. We detail the process of speech recording by professional debaters, the transcription of the speeches with an Automatic Speech Recognition (ASR) system, their consequent automatic processing to produce a text that is more "NLP-friendly", and in parallel -- the manual transcription of the speeches in order to produce gold-standard "reference" transcripts. We release 60 speeches on various controversial topics, each in five formats corresponding to the different stages in the production of the data. The intention is to allow utilizing this resource for multiple research purposes, be it the addition of in-domain training data for a debate-specific ASR system, or applying argumentation mining on either noisy or clean debate transcripts. We intend to make further releases of this data in the future.

System and method for predicting an optimal machine translation system for a user based on an updated user profile

Mirkin, Shachar; Meunier, Jean-Luc

Not provided in source

2017

Joint Learning of Correlated Sequence Labelling Tasks Using Bidirectional Recurrent Neural Networks

Pahuja, Vardaan; Laha, Anirban; Mirkin, Shachar; Raykar, Vikas; Kotlerman, Lili; Lev, Guy

Interspeech 2017

The stream of words produced by Automatic Speech Recognition (ASR) systems is typically devoid of punctuations and formatting. Most natural language processing applications expect segmented and well-formatted texts as input, which is not available in ASR output. This paper proposes a novel technique of jointly modeling multiple correlated tasks such as punctuation and capitalization using bidirectional recurrent neural networks, which leads to improved performance for each of these tasks. This method could be extended for joint modeling of any other correlated sequence labeling tasks.

Personalized Machine Translation: Preserving Original Author Traits

Rabinovich, Ella; Mirkin, Shachar; Patel, Raj Nath; Specia, Lucia; Wintner, Shuly

EACL 2017

Ella Rabinovich, Raj Nath Patel, Shachar Mirkin, Lucia Specia, Shuly Wintner. Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. 2017.

2016

Learning generation templates from dialog transcripts

Sriram Venkatapathy, Shachar Mirkin, Marc Dymetman

Not provided in source

Method and system for summarizing a document

Gupta, Anand; Kaur, Manpreet; Mirkin, Shachar

Not provided in source

System and method for incrementally updating a reordering model for a statistical machine translation system

Mirkin, Shachar

Not provided in source

2015

Motivating Personality-aware Machine Translation

Mirkin, Shachar; Nowson, Scott; Brun, Caroline; Perez, Julien

The 2015 Conference on Empirical Methods on Natural Language Processing (EMNLP)

Language use is known to be influenced by personality traits as well as by sociodemographic characteristics such as age or mother tongue.As a result, it is possible to automatically identify these traits of the author from her texts.It has recently been shown that knowledge of such dimensions can improve performance in NLP tasks such as topic and sentiment modeling.We posit that machine translation is another application that should be personalized.In order to motivate this, we explore whether translation preserves demographic and psychometric traits.We show that, largely, both translation of the source training data into the target language, and the target test data into the source language has a detrimental effect on the accuracy of predicting author traits.We argue that this supports the need for personal and personality-aware machine translation models.

Personalized machine translation: Predicting translational preferences

Mirkin, Shachar; Meunier, Jean-Luc

Proceedings of the 2015 conference on empirical methods in natural language processing , pp. 2019-2025

Machine Translation (MT) has advanced in recent years to produce better translations for clients' specific domains, and sophisticated tools allow professional translators to obtain translations according to their prior edits. We suggest that MT should be further personalized to the end-user level -the receiver or the author of the text -as done in other applications. As a step in that direction, we propose a method based on a recommender systems approach where the user's preferred translation is predicted based on preferences of similar users. In our experiments, this method outperforms a set of non-personalized methods, suggesting that user preference information can be employed to provide better-suited translations for each user.

XRCE personal language analytics engine for multilingual author profiling

Nowson, Scott; Perez, Julien; Brun, Caroline; Mirkin, Shachar; Roux, Claude

Working Notes Papers of the CLEF , pp. 1412-1424

This technical notebook describes the methodology used - and results achieved - for the PAN 2015 Author Profiling Challenge by the team from Xe- rox Research Centre Europe (XRCE). This year, personality traits are introduced alongside age and gender in a corpus of tweets in four languages - English, Span- ish, Italian and Dutch. We describe a largely language agnostic methodology for classification which uses language specific linguistic processing to generate fea- tures. We also report on experiments in which we use machine translation to accommodate for languages in which there is less training data. Native language results are successful, but socio-demographic signals in language seem to be lost under MT conditions.

Semantic refining of cross-lingual information retrieval results

Mirkin, Shachar; Lagos, Nikolaos; Calapodescu, loan

Not provided in source

Refining inference rules with temporal event clustering

Jacquet, Guillaume; Mirkin, Shachar

Not provided in source

Machine translation-driven authoring system and method

Venkatapathy, Sriram; Mirkin, Shachar

Not provided in source

2014

Confidence-driven rewriting of source texts for improved translation

Mirkin, Shachar; Venkatapathy, Sriram; Dymetman, Marc

Not provided in source

Incrementally Updating the SMT Reordering Model

Mirkin, Shachar

Proceedings of The 28th Pacific Asia Conference on Language, Information and Computing (PACLIC), Phuket, Thailand

This work is concerned with incrementally training statistical machine translation (SMT) models when new data becomes available.That, in contrast to re-training new models based on the entire accumulated data.Incremental training provides a way to perform faster, more frequent model updates, enabling keeping the SMT system up-to-date with the most recent data.Specifically, we address incrementally updating the reordering model (RM), a component in phrase-based machine translation that models phrase order changes between the source and the target languages, and for which incremental training has not been proposed so far.First, we show that updating the reordering model is helpful for improving translation quality.Second, we present an algorithm for updating the reordering model within the popular Moses SMT system.Our method produces the exact same model as when training the model from scratch, but doing so much faster.

Comparison of data selection techniques for the translation of video lectures

Wuebker, Joern; Ney, Hermann; Martínez-Villaronga, Adrià; Giménez Pastor, Adrián; Juan Císcar, Alfonso; Servan, Christophe; Dymetman, Marc; Mirkin, Shachar

For the task of online translation of scientific video lectures, using huge models is not possi-ble. In order to get smaller and efficient models, we perform data selection. In this paper, we perform a qualitative and quantitative comparison of several data selection techniques, based on cross-entropy and infrequent n-gram criteria. In terms of BLEU, a combination of transla-tion and language model cross-entropy achieves the most stable results. As another important criterion for measuring translation quality in our application, we identify the number of out-of-vocabulary words. Here, infrequent n-gram recovery shows superior performance. Finally, we combine the two selection techniques in order to benefit from both their strengths. 1

Data Selection for Compact Adapted SMT Models

Mirkin, Shachar; Besacier, Laurent

Proceedings of AMTA

Data selection is a common technique for adapting statistical translation models for a specific domain, which has been shown to both improve translation quality and to reduce model size. Selection relies on some in-domain data, of the same domain of the texts expected to be trans-lated. Selecting the sentence-pairs that are most similar to the in-domain data from a pool of parallel texts has been shown to be effective; yet, this approach holds the risk of resulting in a limited coverage, when necessary n-grams that do appear in the pool are less similar to in-domain data that is available in advance. Some methods select additional data based on the actual text that needs to be translated. While useful, this is not always a practical scenario. In this work we describe an extensive exploration of data selection techniques over Arabic to French datasets, and propose methods to address both similarity and coverage considerations while maintaining a limited model size. 1

Text summarization through entailment-based minimum vertex cover

Gupta, Anand; Kaur, Manpreet; Mirkin, Shachar; Singh, Adarsh; Goyal, Aseem

Proceedings of the Third Joint Conference on Lexical and Computational Semantics (* SEM 2014) , pp. 75-80

Sentence Connectivity is a textual characteristic that may be incorporated intelligently for the selection of sentences of a well meaning summary. However, the existing summarization methods do not utilize its potential fully. The present paper introduces a novel method for singledocument text summarization. It poses the text summarization task as an optimization problem, and attempts to solve it using Weighted Minimum Vertex Cover (WMVC), a graph-based algorithm. Textual entailment, an established indicator of semantic relationships between text units, is used to measure sentence connectivity and construct the graph on which WMVC operates. Experiments on a standard summarization dataset show that the suggested algorithm outperforms related methods.

2013

Confidence-driven Rewriting for Improved Translation

Mirkin, Shachar; Venkatapathy, Sriram; Dymetman, Marc

MT Summit

Not provided in source

Assessing quick update methods of statistical translation models

Mirkin, Shachar; Cancedda, Nicola

Proceedings of the International Workshop of Spoken Language Translation (IWSLT) , pp. 264-271

International audience

Error prediction with partial feedback

Darling, William; Archambeau, Cédric; Mirkin, Shachar; Bouchard, Guillaume

Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2013, Prague, Czech Republic, September 23-27, 2013, Proceedings, Part II 13 , pp. 80-94

Not provided in source

SORT: An Interactive Source-Rewriting Tool for Improved Translation

Mirkin, Shachar; Venkatapathy, Sriram; Dymetman, Marc; Calapodescu, Ioan

ACL Demos

The quality of automatic translation is affected by many factors. One is the divergence between the specific source and target languages. Another lies in the source text itself, as some texts are more complex than others. One way to handle such texts is to modify them prior to translation. Yet, an important factor that is often overlooked is the source translatability with respect to the specific translation system and the specific model that are being used. In this paper we present an interactive system where source modifications are induced by confidence estimates that are derived from the translation model in use. Modifications are automatically generated and proposed for the user’s approval. Such a system can reduce postediting effort, replacing it by cost-effective pre-editing that can be done by monolinguals. 1

2012

An SMT-driven authoring tool

Venkatapathy, Sriram; Mirkin, Shachar

Proceedings of COLING 2012: Demonstration Papers , pp. 459-466

This paper presents a tool for assisting users in composing texts in a language they do not know. While Machine Translation (MT) is pretty useful for understanding texts in an unfamiliar language, current MT technology has yet to reach the stage where it can be used reliably without a post-editing step. This work attempts to make a step towards achieving this goal. We propose a tool that provides suggestions for the continuation of the text in the source language (that the user knows), creating texts that can be translated to the target language (that the user does not know). In terms of functionality, our tool resembles text prediction applications. However , the target language, through a Statistical Machine Translation (SMT) model, drives the composition and not only the source language. We present the user interface and describe the considerations that underline the suggestion process. A simulation of user interaction shows that composition speed can be substantially reduced and provides initial positive feedback as to the ability to generate better translations.

2011

Context and Discourse in Textual Entailment Inference

Mirkin, Shachar

PhD Thesis

Not provided in source

Knowledge and Tree-Edits in Learnable Entailment Proofs

Stern, Asher; Mirkin, Shachar; Shnarch, Eyal; Kotlerman, Lili; Dagan, Ido; Lotan, Amnon; Berant, Jonathan

TAC

This paper describes BIUTEE - Bar Ilan University Textual Entailment Engine. BIUTEE is a natural language inference system in which the hypothesis is proven by the text, based on linguistic- and world- knowledge resources, as well as syntactically motivated tree transformations. The main progress in BIUTEE in the last year is a new confidence model that estimates the validity of the proof found by BIUTEE.

Classification-based contextual preferences

Mirkin, Shachar; Dagan, Ido; Kotlerman, Lili; Szpektor, Idan

Proceedings of the TextInfer 2011 Workshop on Textual Entailment , pp. 20-29

This paper addresses context matching in tex-tual inference. We formulate the task under the Contextual Preferences framework which broadly captures contextual aspects of infer-ence. We propose a generic classification-based scheme under this framework which co-herently attends to context matching in infer-ence and may be employed in any inference-based task. As a test bed for our scheme we use the Name-based Text Categorization (TC) task. We define an integration of Contextual Prefer-ences into the TC setting and present a concrete self-supervised model which instantiates the generic scheme and is applied to address con-text matching in the TC task. Experiments on standard TC datasets show that our approach outperforms the state of the art in context mod-eling for Name-based TC. 1

2010

Rule Chaining and Approximate Match in textual inference

Stern, Asher; Shnarch, Eyal; Mirkin, Shachar; Kotlerman, Lili; Zeichner, Naomi; Dagan, Ido; Lotan, Amnon; Berant, Jonathan

TAC

This paper describes the participation of Bar-Ilan university in the sixth RTE challenge. Our textual-entailment engine, BiuTee, was enhanced with new components that introduce chaining of lexical-entailment rules, and tackle the problem of approximately matching the text and the hypothesis after all available knowledge of entailment rules was utilized. We have also re-engineered our system aiming at an open-source open architecture. BiuTee’s performance is better than the median of all-submissions, and outperforms significantly an IR-oriented baseline. 1

Learning an expert from human annotations in statistical machine translation: The case of out-of-vocabulary words

Aziz, Wilker; Dymetman, Marc; Specia, Lucia; Mirkin, Shachar

Proceedings of the 14th Annual Conference of the European Association for Machine Translation

We present a general method for incorporating an “expert ” model into a Statistical Machine Translation (SMT) system, in order to improve its performance on a particular “area of expertise”, and apply this method to the specific task of finding adequate replacements for Out-of-Vocabulary (OOV) words. Candidate replacements are paraphrases and entailed phrases, obtained using monolingual resources. These candidate replacements are transformed into “dynamic biphrases”, generated at decoding time based on the context of each source sentence. Standard SMT features are enhanced with a number of new features aimed at scoring translations produced by using different replacements. Active learning is used to discriminatively train the model parameters from human assessments of the quality of translations. The learning framework yields an SMT system which is able to deal with sentences containing OOV words but also guarantees that the performance is not degraded for input sentences without OOV words. Results of experiments on English-French translation show that this method outperforms previous work addressing OOV words in terms of acceptability. 1

Recognising entailment within discourse

Mirkin, Shachar; Berant, Jonathan; Dagan, Ido; Shnarch, Eyal

Proceedings of the 23rd International Conference on Computational Linguistics , pp. 770-778

Texts are commonly interpreted based on the entire discourse in which they are situated. Discourse processing has been shown useful for inference-based application; yet, most systems for textual entailment – a generic paradigm for applied inference – have only addressed discourse considerations via off-the-shelf coreference resolvers. In this paper we explore various discourse aspects in entailment inference, suggest initial solutions for them and investigate their impact on entailment performance. Our experiments suggest that discourse provides useful information, which significantly improves entailment inference, and should be better addressed by future entailment systems. 1

A Resource for Investigating the Impact of Anaphora and Coreference on Inference

Abad, Azad; Bentivogli, Luisa; Dagan, Ido; Giampiccolo, Danilo; Mirkin, Shachar; Pianta, Emanuele; Stern, Asher

LREC

Discourse phenomena play a major role in text processing tasks. However, so far relatively little study has been devoted to the relevance of discourse phenomena for inference. Therefore, an experimental study was carried out to assess the relevance of anaphora and coreference for Textual Entailment (TE), a prominent inference framework. First, the annotation of anaphoric and coreferential links in the RTE-5 Search data set was performed according to a specifically designed annotation scheme. As a result, a new data set was created where all anaphora and coreference instances in the entailing sentences which are relevant to the entailment judgment are solved and annotated. A by-product of the annotation is a new “augmented ” data set, where all the referring expressions which need to be resolved in the entailing sentences are replaced by explicit expressions. Starting from the final output of the annotation, the actual impact of discourse phenomena on inference engines was investigated, identifying the kind of operations that the systems need to apply to address discourse phenomena and trying to find direct mappings between these operation and annotation types. 1.

Assessing the role of discourse references in entailment inference

Mirkin, Shachar; Dagan, Ido; Padó, Sebastian

Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics , pp. 1209-1219

Discourse references, notably coreference and bridging, play an important role in many text understanding applications, but their impact on textual entailment is yet to be systematically understood. On the ba-sis of an in-depth analysis of entailment instances, we argue that discourse refer-ences have the potential of substantially improving textual entailment recognition, and identify a number of research direc-tions towards this goal. 1

2009

Addressing Discourse and Document Structure in the RTE Search Task

Mirkin, Shachar; Bar-Haim, Roy; Dagan, Ido; Shnarch, Eyal; Stern, Asher; Szpektor, Idan; Berant, Jonathan

TAC

This paper describes Bar-Ilan University’s submissions to RTE-5. This year we focused on the Search pilot, enhancing our entailment system to address two main issues introduced by this new setting: scalability and, primarily, document-level discourse. Our system achieved the highest score on the Search task amongst participating groups, and proposes first steps towards addressing this challenging setting. 1

Evaluating the inferential utility of lexical-semantic resources

Mirkin, Shachar; Dagan, Ido; Shnarch, Eyal

Proceedings of the 12th Conference of the European Chapter of the ACL (EACL 2009) , pp. 558-566

Lexical-semantic resources are used extensively for applied semantic inference, yet a clear quantitative picture of their current utility and limitations is largely missing.

Source-language entailment modeling for translating unknown terms

Mirkin, Shachar; Specia, Lucia; Cancedda, Nicola; Dagan, Ido; Dymetman, Marc; Szpektor, Idan

Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP , pp. 791-799

This paper addresses the task of handling unknown terms in SMT. We propose using source-language monolingual models and resources to paraphrase the source text prior to translation. We further present a conceptual extension to prior work by allowing translations of entailed texts rather than paraphrases only. A method for performing this process efficiently is presented and applied to some 2500 sentences with unknown terms. Our experiments show that the proposed approach substantially increases the number of properly translated texts.

2008

Efficient semantic deduction and approximate matching over compact parse forests

Bar-Haim, Roy; Berant, Jonathan; Dagan, Ido; Greental, Iddo; Mirkin, Shachar; Shnarch, Eyal; Szpektor, Idan

Proceedings of TAC

Semantic inference is often modeled as application of entailment rules, which specify generation of entailed sentences from a source sentence. Efficient generation and representation of entailed consequents is a fundamental problem common to such inference methods. We present a new data structure, termed compact forest, which allows efficient generation and representation of entailed consequents, each represented as a parse tree. Rule-based inference is complemented with a new approximate matching measure inspired by tree kernels, which is computed efficiently over compact forests. Our system also makes use of novel large-scale entailment rule bases, derived from Wikipedia as well as from information about predicates and their argument mapping, gathered from available lexicons and complemented by unsupervised learning. 1

2006

Integrating pattern-based and distributional similarity methods for lexical entailment acquisition

Mirkin, Shachar; Dagan, Ido; Geffet, Maayan

Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions , pp. 579-586

This paper addresses the problem of acquiring lexical semantic relationships, applied to the lexical entailment relation. Our main contribution is a novel conceptual integration between the two distinct acquisition paradigms for lexical relations - the pattern-based and the distributional similarity approaches. The integrated method exploits mutual complementary information of the two approaches to obtain candidate relations and informative characterizing features. Then, a small size training set is used to construct a more accurate supervised classifier, showing significant increase in both recall and precision over the original approaches.