Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding    Posted:


Date: 28 September 2016, 14:00, Room C005

Speaker: Lina Rojas-Barahona

Abstract : This paper presents a deep learning architecture for the semantic decoder component of a Statistical Spoken Dialogue System. In a slot-filling dialogue, the semantic decoder predicts the dialogue act and a set of slot-value pairs from a set of n-best hypotheses returned by the Automatic Speech Recognition. Most current models for spoken language understanding assume (i) word-aligned semantic annotations as in sequence taggers and (ii) delexicalisation, or a mapping of input words to domain-specific concepts using heuristics that try to capture morphological variation but that do not scale to other domains nor to language variation (e.g., morphology, synonyms, paraphrasing ). In this work the semantic decoder is trained using unaligned semantic annotations and it uses distributed semantic representation learning to overcome the limitations of explicit delexicalisation. The proposed architecture uses a convolutional neural network for the sentence representation and a long-short term memory network for the context representation. Results are presented for the publicly available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a significantly higher word error rate (WER).

Comments

Project GolFred Presenation    Posted:


Date: 16 September, 2016, 14:00, Room LORIA B-011

Speaker: Émilie Colin

Abstract : The project, golfred, is about machine reading for narrative generation of spatial experiences in service robots. A robot, Golem, reads the panels found while he moves in a real environment. The phrases read by golem are transformed by Fred into a semantic representation. Furthermore, this semantic representation is linked to and enriched with DBPedia knowledge. The task of the Synalp team is to develop a generator from the final representation produced by Fred. Those representations will contain any kind of event, role, specification. I worked on verbnet to generate a set of grammar trees linked to semantic schemas. I will present GenI, Verbnet, and their association and will close my presentation with the work on fred data.

Comments

Multimodal content-aware image thumbnailing    Posted:


Date: 22 September, 2016, 10:00, Room B-011

Speaker: Kohei Yamamoto

Abstract : In this presentation, I'd like to introduce my previous research topic, multimodal image thumbnailing. As a background, mobile applications (in this case, news article recommendation) have the key problem of needing to eliminate the redundant information in order to provide more relevant information within a limited time and space. To tackle this problem, I proposed a multimodal image thumbnailing method considering both images and text. The proposed method generates an energy map expressing content by aligning image fractions and words via multimodal neural networks, and we can crop an appropriate region with respect to the corresponding text by using the energy map. We evaluate this approach on a real data set based on news articles that appeared on Yahoo! JAPAN. Experimental results demonstrate the effectiveness of our proposed method.

Comments

Unsupervised Ranking of Knowledge Bases for Named Entity Recognition    Posted:


Date: 02 September, 2016, 10:00, Room LORIA A-008

Speaker: Yassine M'rabet (Lister Hill National Center for Biomedical Communications, National Library of Medicine, USA)

Abstract : With the continuous growth of freely accessible knowledge bases and the heterogeneity of textual corpora, selecting the most adequate knowledge base for named entity recognition is becoming a challenge in itself. In this talk, we will present an unsupervised method to rank knowledge bases according to their adequacy for the recognition of named entities in a given corpus. Building on a state-of-the-art, unsupervised entity linking approach, we propose several evaluation metrics to measure the lexical and structural adequacy of a knowledge base for a given corpus. We study the correlation between these metrics and three standard performance measures: precision, recall and F1 score. Our multi-domain experiments on 9 different corpora with 6 knowledge bases show that three of the proposed metrics are strong performance predictors having 0.62 to 0.76 Pearson correlation with precision and 0.96 correlation with both recall and F1 score.

Comments

Outils de clustering diachronique pour analyser l'évolution de la production scientifique    Posted:


Date: 01 July, 2016, 14:30, Room LORIA B-011

Speaker: Nicolas Dugué

Abstract : Au sein du projet ISTEX-R, nous avons pour mission de faciliter le suivi de l'évolution de la production scientifique à travers l'étude de la base de publications ISTEX. Dans ce cadre, nous avons mis en place une solution de clustering diachronique qui permet de suivre les thématiques de recherche à travers le temps : fusion, séparation, apparition, disparition. Nous détaillerons dans un premier temps des outils de mesure de qualité et d'étiquetage de cluster nécessaires à notre démarche. Nous présenterons ensuite des résultats préliminaires sur un corpus ISTEX. Enfin, nous décrirons une plateforme de visualisation dédiée à l'exploration de ces résultats.

Comments

Generating Stories from Different Event Orders: A Statistical Approach    Posted:


Date : 02 May 2016, 16:00, Room B011

Speaker: Anastasia Shimorina

Abstract :
The research presents the strategy how to find statistically significant language patterns and make use of them in generating new texts. Namely, the temporal relations in narrative are explored. To investigate the narrative temporal structure, a specially designed corpus of stories is used. For each story, main events and their chronological and discourse-level orders are known. This corpus allows us to identify common temporal models for specific orders of events at the discourse level. The Conditional Random Fields method is applied to predict the best temporal model for each event order. The acquired temporal models are used in a template-based natural language generation system which outputs stories. The stories generated by the system are evaluated by human subjects. We demonstrate that stories generated according to the acquired temporal models are adequately interpreted by humans.

Comments

NVIDIA Grant    Posted:


NVIDIA has granted our team a NVIDIA Titan X GPU card, which will be extremely useful to speed up our deep learning experiments about NLP. We have installed the Titan X card and it is now up and running.

We gratefully acknowledge the support of NVIDIA Corporation with the donation of this Titan X GPU to be used in our research !

Comments

Transfert cross-lingue de dépendances syntaxiques par apprentissage partiel d'analyseur par transition    Posted:


Date: 18 April, 2016, 11:00, Room LORIA B013

Speaker: Ophélie Lacroix

Abstract : Dans le domaine du TAL, les méthodes d'apprentissage supervisé sont largement exploitées du fait de leur efficacité mais requièrent l'accès à de large ensembles de données correctement annotées. De telles données ne sont néanmoins pas disponibles dans toutes les langues. Le transfert d'information cross-lingue est une des solutions qui permet de construire des outils d'analyse pour des langues peu dotées en s'appuyant sur les informations disponibles dans une ou plusieurs langues sources bien dotées. Nous nous intéressons en particulier au transfert d'annotations en dépendances à l'aide de corpus parallèles. Les annotations disponibles en langue source sont projetées sur les données de la langue cible via des liens d'alignements. Nous choisissons de limiter la projection aux dépendances les plus sûres, générant des données cibles partiellement annotées. Nous montrons alors qu'il est possible d'apprendre un analyseur par transition à partir de données partiellement annotées grâce à l'utilisation d'un oracle dynamique. Cette méthode simple de transfert obtient des performances qui rivalisent avec celles de méthodes état-de-l'art récentes, tout en ayant un coût algorithmique moindre.

Comments

Research and Challenges in Natural Language Processing at the GPLSI group    Posted:


Date: 14 April, 2016, 10:30, Room B011

Speaker: Elena Lloret (University of Alicante, Spain)

Title: Research and Challenges in Natural Language Processing at the GPLSI group

Abstract: In this talk, I will introduce the research carried out by the GPLSI Research Group of the University of Alicante (Spain). I will first provide a brief introductory information about the group. Then, I will summarise the main research areas addressed, as well as the most recent projects and applications developed. Finally, I will focus on Text Summarization and Natural Language Generation, the research fields in which I am more interested in. I will outline the work in progress, together with the challenges that need to be faced.

Comments

Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing    Posted:


Date: 21st January, 2016, 14:00, Room C005

Speaker: Shashi Narayan (U. Edinburgh, UK)

Title: Paraphrase Generation from Latent-Variable PCFGs for Semantic Parsing

Abstract : One of the limitations of semantic parsing approaches to open-domain question answering is the lexicosyntactic gap between natural language questions and knowledge base entries -- there are many ways to ask a question, all with the same answer. In this paper we propose to bridge this gap by generating paraphrases to the input question with the goal that at least one of them will be correctly mapped to a correct knowledge-base query. We introduce a novel grammar model for paraphrase generation that does not require any sentence-aligned paraphrase corpus. Our key idea is to leverage the flexibility and scalability of latent-variable probabilistic context-free grammars to sample paraphrases. We do an extrinsic evaluation of our paraphrases by plugging them into a semantic parser for Freebase. Our evaluation experiments on WebQuestions benchmark dataset show that the performance of the semantic parser significantly improves over strong baselines.

Bio: Shashi Narayan is a research associate at School of Informatics at the University of Edinburgh. He is currently working with Shay Cohen on the problems of spectral methods for parsing and generation. Before, he earned his doctoral degree in 2014 from Université de Lorraine, under the supervision of Claire Gardent. He received Erasmus Mundus Masters scholarship (2009-2011) in Language and Communication Technology (EM-LCT). He did his major (Bachelor of Technology (Honors), 2005-2009) in Computer Science and Engineering from Indian Institute of Technology (IIT), Kharagpur India.

He is interested in the application of syntax and semantics to solve various NLP problems, in particular, natural language generation, parsing, sentence simplification, paraphrase generation and question answering.

Comments

Contents © 2016 Christophe Cerisara - Powered by Nikola
Share