Department of Mathematics and Computer Science

Module: Apprentissage automatique II

  1. Information
  2. Questions

Which language is this sentence ?

0 votes

Given the following data:

"we will present an organized picture on ensemble methods" => English

"Android is by far the world's most popular mobile operating system " => English

"Sundar Pichai a étudié à l'Indian Institute of Technology" => French

"Le gouvernement chinois avait publié une directive destinée à "préserver les droits légitimes et les intérêts des citoyens en ligne" => French

"هواوي تطلق هاتفها الجديد "ميت إكس" القابل للطي في الصين" => Arabic

"کناره‎ گیری بنیانگذاران گوگل از مدیریت شرکت آلفابت" => Persian


"Facebook chatbot offers to answer tricky questions" => ????

Explain the procedure and algorithm to automatically detect the language

Asked on 20:15, Wednesday 4 Dec 2019 By Imed BOUCHRIKA
In Apprentissage automatique II

answers (6)

Answer (1)

0 votes

well , we need a data that contain all the languages  and every language with there words (like a dictionary),
then we use "the edit distance "  to define the language

Answered on 22:25, Wednesday 4 Dec 2019 by abd raouf ben tlidjane (9 points)
In Apprentissage automatique II

Answer (2)

0 votes

We use  those sentences to make our training data .  wich contain the keywords for every language 

Determine the keywords for test or the new sentence 

Then we calculate the similarity using distance or probability 

By comparing the new sentence keywords with the old one

Answered on 01:35, Thursday 5 Dec 2019 by amira bouamrane (5 points)
In Apprentissage automatique II

Answer (3)

0 votes

Natural Language Processing

Answered on 22:09, Saturday 7 Dec 2019 by khalil souaiaia (13 points)
In Apprentissage automatique II

Answer (4)

0 votes

on construit une base de connaissance non rededante de chaque langue apartir des phrases d'apprentissage.

pour une nouvelle phrase on calcule la distance de chaque mot de la nouvelle phrase avec les mots d'apprentisage.

la langue en objet est c'elle qui correspond au max des distances minimales

Answered on 19:49, Sunday 8 Dec 2019 by wahiba han (4 points)
In Apprentissage automatique II

Answer (5)

0 votes

sallem alaykom,

   first , a language detection algorithm is pretty self-explanatory: we'll take text as input and decide which human language the text is written in. We'll be using Algorithmia's language identification algorithm , to give it a try.

   secondlly ,

Language classifications rely upon using an primer of specialized text called a 'corpus'. There is one corpus for each language the algorithm can identify. Speaking in summary, input text is compared to each corpus and pattern matching is used to identify the strongest correlation to a corpus.

Because there are so many potential words to profile in every language, computer scientists use algorithms called 'profiling algorithms' to create a subset of words for each language, to be used for the corpus. The most common strategy is to choose very common words. In English, we might choose words like "the", "and", "of" and "or".

In this example, we will use Node.js to make our request. Keep in mind you may perform any Algorithmia API request with any tools at your disposal. See "Appendix".A to learn how to configure your machine for development with the Algorithmia API in Node.js.

var algorithmia = require("algorithmia");
var client = algorithmia(process.env.ALGORITHMIA_API_KEY);
var input = "This is a demo sentence for language detection.";

client.algo("/nlp/LanguageIdentification/0.1.0").pipe(input).then(function(output) {
    if (output.error) {
    } else {

Note that the API key should be modified to match your configuration.

This will return the string en to indicate English. You may read more about the implementation and find a list of supported languages on the algortihm's page.

Answered on 17:28, Thursday 12 Dec 2019 by noussaiba ledjemel (23 points)
In Apprentissage automatique II

Answer (6)

0 votes

The process of language identification can be represented as a system .this dataflow was used in for text categorization . However, instead of languages, the authors used categories and models were called profiles. Accordingly, the first stage is the modelling stage, where language models are generated. Such models consist of features, representing specific characteristics of language. These features are words or
N-grams with their occurrences in the training set. The language models are determined for each language included in the training corpus. On the other hand a document model is a similar model that is created from an input document for which the language should be determined.
After all the models have been generated, the document model is compared to all language models in the classification stage and the distance between them is measured with the help of classification techniques. The language model that has the minimum distance to the input document represents the language of the document.

Answered on 21:56, Friday 3 Jan 2020 by sihem djélamda (9 points)
In Apprentissage automatique II

Do you have an answer ?