Distributional Semantic Spaces: Creation and Applications    Posted:


Date: 30 November, 2016, 14:00

Speaker: Denis Paperno

Abstract : Distributional semantic vectors (also known as word embeddings) are increasingly popular in various natural language tasks. The talk will describe how distributional semantic models are created, investigate some of the model hyperparameters, and illustrate their applications.

Comments

XMG2: Describing Description Languages    Posted:


Date : 01 December 2016, 11:00 am, Room B013

Speaker: Yannick Parmentier

Abstract : In this talk, we introduce XMG2, a modular and extensible tool for various linguistic description tasks. Based on the notion of meta-compilation (that is, compilation of compilers), XMG2 reuses the main concepts underlying XMG, namely logic programming and constraint satisfaction, to generate on-demand XMG-like compilers by assembling elementary units called language bricks. This brick-based definition of compilers permits users to design description languages in a highly flexible way. In particular, it makes it possible to support several levels of linguistic description (e.g. syntax, morphology) within a single description language. XMG2 aims to offer means for users to easily define description languages that fit as much as possible the linguistic intuition.

Comments

Is Very Deep Convolutional Neural Network necessary for Text Classification?    Posted:


Date : 01 December 2016, 10:00 am, Room B013

Speaker: Hoa Le Thien

Abstract : Convolutional Neural Network is famous for a long time on the Image Classification task because it can retrieve the state-of-the-art performance when it goes very deeply. It is demonstrated as well the same power for the domain of Speech Recognition but is it always the case for Text Classification ? There're a lot of results against this suspect. In this presentation, I will explain briefly the structure of a shallow Convolutional Neural Network and then compare its result with a Very Deep ConvNet. The others structures like word2vec, fasttext will also be included to discuss. The presentation will be concluded with a new perspective path of research.

Comments

Learning Embeddings to lexicalise RDF Properties    Posted:


Date : 10 November 2016, 10:30am, Room B013

Speaker: Laura Perez-Beltrachini

Abstract :
A difficult task when generating text from knowledge bases (KB) consists in finding appropriate lexicalisations for KB symbols. We present an approach for lexicalising knowledge base relations and apply it to DBPedia data. Our model learns low-dimensional embeddings of words and RDF resources and uses these representations to score RDF properties against candidate lexicalisations. Training our model using (i) pairs of RDF triples and automatically generated verbalisations of these triples and (ii) pairs of paraphrases extracted from various resources, yields competitive results on DBPedia data.

Comments

Sequence-based Structured Prediction for Semantic Parsing    Posted:


Date: 18 October, 2016, 14:00, Room A008

Speaker: Chunyang Xiao

Abstract : We propose an approach for semantic parsing that uses a recurrent neural network to map a natural language question into a logical form representation of a KB query. Building on recent work by (Wang et al., 2015), the interpretable logical forms, which are structured objects obeying certain constraints, are enumerated by an underlying grammar and are paired with their canonical realizations. In order to use sequence prediction, we need to sequentialize these logical forms.

We compare three sequentializations: a direct linearization of the logical form, a linearization of the associated canonical realization, and a sequence consisting of derivation steps relative to the underlying grammar. We also show how grammatical constraints on the derivation sequence can easily be integrated inside the RNN-based sequential predictor. Our experiments show important improvements over previous results for the same dataset, and also demonstrate the advantage of incorporating the grammatical constraints.

Comments

Exploiting Sentence and Context Representations in Deep Neural Models for Spoken Language Understanding    Posted:


Date: 28 September 2016, 14:00, Room C005

Speaker: Lina Rojas-Barahona

Abstract : This paper presents a deep learning architecture for the semantic decoder component of a Statistical Spoken Dialogue System. In a slot-filling dialogue, the semantic decoder predicts the dialogue act and a set of slot-value pairs from a set of n-best hypotheses returned by the Automatic Speech Recognition. Most current models for spoken language understanding assume (i) word-aligned semantic annotations as in sequence taggers and (ii) delexicalisation, or a mapping of input words to domain-specific concepts using heuristics that try to capture morphological variation but that do not scale to other domains nor to language variation (e.g., morphology, synonyms, paraphrasing ). In this work the semantic decoder is trained using unaligned semantic annotations and it uses distributed semantic representation learning to overcome the limitations of explicit delexicalisation. The proposed architecture uses a convolutional neural network for the sentence representation and a long-short term memory network for the context representation. Results are presented for the publicly available DSTC2 corpus and an In-car corpus which is similar to DSTC2 but has a significantly higher word error rate (WER).

Comments

Project GolFred Presentation    Posted:


Date: 16 September, 2016, 14:00, Room LORIA B-011

Speaker: Émilie Colin

Abstract : The project, golfred, is about machine reading for narrative generation of spatial experiences in service robots. A robot, Golem, reads the panels found while he moves in a real environment. The phrases read by golem are transformed by Fred into a semantic representation. Furthermore, this semantic representation is linked to and enriched with DBPedia knowledge. The task of the Synalp team is to develop a generator from the final representation produced by Fred. Those representations will contain any kind of event, role, specification. I worked on verbnet to generate a set of grammar trees linked to semantic schemas. I will present GenI, Verbnet, and their association and will close my presentation with the work on fred data.

Comments

Multimodal content-aware image thumbnailing    Posted:


Date: 22 September, 2016, 10:00, Room B-011

Speaker: Kohei Yamamoto

Abstract : In this presentation, I'd like to introduce my previous research topic, multimodal image thumbnailing. As a background, mobile applications (in this case, news article recommendation) have the key problem of needing to eliminate the redundant information in order to provide more relevant information within a limited time and space. To tackle this problem, I proposed a multimodal image thumbnailing method considering both images and text. The proposed method generates an energy map expressing content by aligning image fractions and words via multimodal neural networks, and we can crop an appropriate region with respect to the corresponding text by using the energy map. We evaluate this approach on a real data set based on news articles that appeared on Yahoo! JAPAN. Experimental results demonstrate the effectiveness of our proposed method.

Comments

Unsupervised Ranking of Knowledge Bases for Named Entity Recognition    Posted:


Date: 02 September, 2016, 10:00, Room LORIA A-008

Speaker: Yassine M'rabet (Lister Hill National Center for Biomedical Communications, National Library of Medicine, USA)

Abstract : With the continuous growth of freely accessible knowledge bases and the heterogeneity of textual corpora, selecting the most adequate knowledge base for named entity recognition is becoming a challenge in itself. In this talk, we will present an unsupervised method to rank knowledge bases according to their adequacy for the recognition of named entities in a given corpus. Building on a state-of-the-art, unsupervised entity linking approach, we propose several evaluation metrics to measure the lexical and structural adequacy of a knowledge base for a given corpus. We study the correlation between these metrics and three standard performance measures: precision, recall and F1 score. Our multi-domain experiments on 9 different corpora with 6 knowledge bases show that three of the proposed metrics are strong performance predictors having 0.62 to 0.76 Pearson correlation with precision and 0.96 correlation with both recall and F1 score.

Comments

Outils de clustering diachronique pour analyser l'évolution de la production scientifique    Posted:


Date: 01 July, 2016, 14:30, Room LORIA B-011

Speaker: Nicolas Dugué

Abstract : Au sein du projet ISTEX-R, nous avons pour mission de faciliter le suivi de l'évolution de la production scientifique à travers l'étude de la base de publications ISTEX. Dans ce cadre, nous avons mis en place une solution de clustering diachronique qui permet de suivre les thématiques de recherche à travers le temps : fusion, séparation, apparition, disparition. Nous détaillerons dans un premier temps des outils de mesure de qualité et d'étiquetage de cluster nécessaires à notre démarche. Nous présenterons ensuite des résultats préliminaires sur un corpus ISTEX. Enfin, nous décrirons une plateforme de visualisation dédiée à l'exploration de ces résultats.

Comments

Contents © 2016 Christophe Cerisara - Powered by Nikola
Share