Experimental evaluation with cross-validation    Posted:

It is extremely easy to make methodological mistakes when evaluating some machine learning system and comparing it with others.

Especially when using cross-validation ! A must-read paper about this topic is:

On Over-fitting in model selection and subsequent selection bias in performance evaluation

So even though you are short of time, whenever you are running model evaluation, please always take the time to double-check that your evaluation methodology is flawless, especially with cross-validation experiments, where you must always tune your hyper-parameters with a nested cross-validations procedure.


Création automatique d'une grammaire syntaxico-sémantique    Posted:

Date: 23 June, 2017, 14:00, Room LORIA C-003

Speaker: Émilie Colin

Abstract : Nous proposons une nouvelle méthode pour la création automatique de grammaires lexicalisées syntaxico-sémantiques. A l'heure actuelle, la création de grammaire résulte soit d'un travail manuel soit d'un traitement automatisé de corpus arboré. Notre proposition est d'extraire à partir de données VerbNet une grammaire noyau (formes canoniques des verbes et des groupes nominaux) de l'anglais intégrant une sémantique VerbNet. Notre objectif est de profiter des larges ressources existantes pour produire un système de génération de texte symbolique de qualité en domaine restreint.


Mapping Natural Language to Description Logic    Posted:

Date: 12 May, 2017, 10:30, Room LORIA B-011

Speaker: Bikash Gyawali

Abstract : While much work on automated ontology enrichment has focused on mining text for concepts and relations, little attention has been paid to the task of enriching ontologies with complex axioms. In this paper, we focus on a form of text that is frequent in industry, namely system installation design principle (SIDP) and we present a framework which can be used both to map SIDPs to OWL DL axioms and to assess the quality of these automatically derived axioms. We present experimental results on a set of 960 SIDPs provided by Airbus which demonstrate (i) that the approach is robust (97.50% of the SIDPs can be parsed) and (ii) that DL axioms assigned to full parses are very likely to be correct in 96% of the cases.


Acquiring Knowledge from Multimodal Sources to Aid Language Understanding    Posted:

Date : 8 March 2017, 14:00 , Room A008

Speaker: Marie-Francine Moens

Abstract : Human language understanding (HLU) by a machine is of large economic and social value. In this lecture we consider language understanding of written text. First, we give an overview of the latest methods for HLU that map language to a formal knowledge representation which facilitates other automated tasks. Most current HLU systems are trained on texts that are manually annotated, which are often lacking in open domain applications. In addition, much content is left implicit in a text, which when humans read a text is inferred by relying on their world and common sense knowledge. We go deeper into the field of representation learning that nowadays is very much studied in computational linguistics. This field investigates methods for representing language as statistical concepts or as vectors, allowing straightforward methods of compositionality. The methods often use deep learning and its underlying neural network technologies to learn concepts from large text collections in an unsupervised way (i.e., without the need for manual annotations). We show how these methods can help, but also demonstrate that these methods are still insufficient to automatically acquire the necessary background knowledge and more specifically world and common sense knowledge needed for language understanding. We go deeper in on how we can learn knowledge jointly from textual and visual data to help language understanding, which will be illustrated with the first results obtained in the MUSTER CHIST-ERA project.


A review of NIPS 2016    Posted:

Date : 3 February 2017, 14:00 am, Room C005

Speaker: Hoa Le Thien

Abstract : In this talk, I will discuss the main hot topics of the AI & Deep Learning community research right now such as : Deep Reinforcement Learning, Generative Adversarial Networks (GAN), Limitations of RNNs, Obstacles of Deep Learning & NLP, Learning to Learn,... The aim of the presentation is to give you a rigorous framework & most updated elements for the future direction of research. This will include the inspiration from differents fields (like robotics, computer vision,...) and the direct implementation in NLP.


Distributional Semantic Spaces: Creation and Applications    Posted:

Date: 30 November, 2016, 14:00

Speaker: Denis Paperno

Abstract : Distributional semantic vectors (also known as word embeddings) are increasingly popular in various natural language tasks. The talk will describe how distributional semantic models are created, investigate some of the model hyperparameters, and illustrate their applications.


XMG2: Describing Description Languages    Posted:

Date : 01 December 2016, 11:00 am, Room B013

Speaker: Yannick Parmentier

Abstract : In this talk, we introduce XMG2, a modular and extensible tool for various linguistic description tasks. Based on the notion of meta-compilation (that is, compilation of compilers), XMG2 reuses the main concepts underlying XMG, namely logic programming and constraint satisfaction, to generate on-demand XMG-like compilers by assembling elementary units called language bricks. This brick-based definition of compilers permits users to design description languages in a highly flexible way. In particular, it makes it possible to support several levels of linguistic description (e.g. syntax, morphology) within a single description language. XMG2 aims to offer means for users to easily define description languages that fit as much as possible the linguistic intuition.


Is Very Deep Convolutional Neural Network necessary for Text Classification?    Posted:

Date : 01 December 2016, 10:00 am, Room B013

Speaker: Hoa Le Thien

Abstract : Convolutional Neural Network is famous for a long time on the Image Classification task because it can retrieve the state-of-the-art performance when it goes very deeply. It is demonstrated as well the same power for the domain of Speech Recognition but is it always the case for Text Classification ? There're a lot of results against this suspect. In this presentation, I will explain briefly the structure of a shallow Convolutional Neural Network and then compare its result with a Very Deep ConvNet. The others structures like word2vec, fasttext will also be included to discuss. The presentation will be concluded with a new perspective path of research.


Learning Embeddings to lexicalise RDF Properties    Posted:

Date : 10 November 2016, 10:30am, Room B013

Speaker: Laura Perez-Beltrachini

Abstract :
A difficult task when generating text from knowledge bases (KB) consists in finding appropriate lexicalisations for KB symbols. We present an approach for lexicalising knowledge base relations and apply it to DBPedia data. Our model learns low-dimensional embeddings of words and RDF resources and uses these representations to score RDF properties against candidate lexicalisations. Training our model using (i) pairs of RDF triples and automatically generated verbalisations of these triples and (ii) pairs of paraphrases extracted from various resources, yields competitive results on DBPedia data.


Sequence-based Structured Prediction for Semantic Parsing    Posted:

Date: 18 October, 2016, 14:00, Room A008

Speaker: Chunyang Xiao

Abstract : We propose an approach for semantic parsing that uses a recurrent neural network to map a natural language question into a logical form representation of a KB query. Building on recent work by (Wang et al., 2015), the interpretable logical forms, which are structured objects obeying certain constraints, are enumerated by an underlying grammar and are paired with their canonical realizations. In order to use sequence prediction, we need to sequentialize these logical forms.

We compare three sequentializations: a direct linearization of the logical form, a linearization of the associated canonical realization, and a sequence consisting of derivation steps relative to the underlying grammar. We also show how grammatical constraints on the derivation sequence can easily be integrated inside the RNN-based sequential predictor. Our experiments show important improvements over previous results for the same dataset, and also demonstrate the advantage of incorporating the grammatical constraints.


Contents © 2017 Christophe Cerisara - Powered by Nikola