Expansion des contractions de la langue anglaise en Python

La langue anglaise est un couple de contractions . Par exemple:

you've -> you have
he's -> he is

ceux-ci peuvent parfois causer des maux de tête lorsque vous faites le traitement du langage naturel. Y a-t-il une bibliothèque Python, qui peut étendre ces contractions?

23
demandé sur Abdo 2013-11-05 17:32:22

6 réponses

j'ai fait que la page contraction-to-expansion de wikipedia dans un dictionnaire de python (voir ci-dessous)

notez, comme vous pouvez vous y attendre, que vous voulez certainement utiliser des guillemets doubles lorsque vous interrogez le dictionnaire:

enter image description here

aussi, j'ai laissé plusieurs options dans comme dans la page wikipedia. N'hésitez pas à modifier comme vous le souhaitez. Notez que la désambiguïsation de la bonne extension serait un problème délicat!

contractions = { 
"ain't": "am not / are not / is not / has not / have not",
"aren't": "are not / am not",
"can't": "cannot",
"can't've": "cannot have",
"'cause": "because",
"could've": "could have",
"couldn't": "could not",
"couldn't've": "could not have",
"didn't": "did not",
"doesn't": "does not",
"don't": "do not",
"hadn't": "had not",
"hadn't've": "had not have",
"hasn't": "has not",
"haven't": "have not",
"he'd": "he had / he would",
"he'd've": "he would have",
"he'll": "he shall / he will",
"he'll've": "he shall have / he will have",
"he's": "he has / he is",
"how'd": "how did",
"how'd'y": "how do you",
"how'll": "how will",
"how's": "how has / how is / how does",
"I'd": "I had / I would",
"I'd've": "I would have",
"I'll": "I shall / I will",
"I'll've": "I shall have / I will have",
"I'm": "I am",
"I've": "I have",
"isn't": "is not",
"it'd": "it had / it would",
"it'd've": "it would have",
"it'll": "it shall / it will",
"it'll've": "it shall have / it will have",
"it's": "it has / it is",
"let's": "let us",
"ma'am": "madam",
"mayn't": "may not",
"might've": "might have",
"mightn't": "might not",
"mightn't've": "might not have",
"must've": "must have",
"mustn't": "must not",
"mustn't've": "must not have",
"needn't": "need not",
"needn't've": "need not have",
"o'clock": "of the clock",
"oughtn't": "ought not",
"oughtn't've": "ought not have",
"shan't": "shall not",
"sha'n't": "shall not",
"shan't've": "shall not have",
"she'd": "she had / she would",
"she'd've": "she would have",
"she'll": "she shall / she will",
"she'll've": "she shall have / she will have",
"she's": "she has / she is",
"should've": "should have",
"shouldn't": "should not",
"shouldn't've": "should not have",
"so've": "so have",
"so's": "so as / so is",
"that'd": "that would / that had",
"that'd've": "that would have",
"that's": "that has / that is",
"there'd": "there had / there would",
"there'd've": "there would have",
"there's": "there has / there is",
"they'd": "they had / they would",
"they'd've": "they would have",
"they'll": "they shall / they will",
"they'll've": "they shall have / they will have",
"they're": "they are",
"they've": "they have",
"to've": "to have",
"wasn't": "was not",
"we'd": "we had / we would",
"we'd've": "we would have",
"we'll": "we will",
"we'll've": "we will have",
"we're": "we are",
"we've": "we have",
"weren't": "were not",
"what'll": "what shall / what will",
"what'll've": "what shall have / what will have",
"what're": "what are",
"what's": "what has / what is",
"what've": "what have",
"when's": "when has / when is",
"when've": "when have",
"where'd": "where did",
"where's": "where has / where is",
"where've": "where have",
"who'll": "who shall / who will",
"who'll've": "who shall have / who will have",
"who's": "who has / who is",
"who've": "who have",
"why's": "why has / why is",
"why've": "why have",
"will've": "will have",
"won't": "will not",
"won't've": "will not have",
"would've": "would have",
"wouldn't": "would not",
"wouldn't've": "would not have",
"y'all": "you all",
"y'all'd": "you all would",
"y'all'd've": "you all would have",
"y'all're": "you all are",
"y'all've": "you all have",
"you'd": "you had / you would",
"you'd've": "you would have",
"you'll": "you shall / you will",
"you'll've": "you shall have / you will have",
"you're": "you are",
"you've": "you have"
}
36
répondu arturomp 2017-12-06 18:02:39

Vous n'avez pas besoin d'une bibliothèque, il est possible de faire avec reg exp par exemple.

>>> import re
>>> contractions_dict = {
...     'didn\'t': 'did not',
...     'don\'t': 'do not',
... }
>>> contractions_re = re.compile('(%s)' % '|'.join(contractions_dict.keys()))
>>> def expand_contractions(s, contractions_dict=contractions_dict):
...     def replace(match):
...         return contractions_dict[match.group(0)]
...     return contractions_re.sub(replace, s)
...
>>> expand_contractions('You don\'t need a library')
'You do not need a library'
14
répondu alko 2013-11-05 13:39:52

Les réponses ci-dessus fonctionne parfaitement bien et pourrait être mieux pour les ambigu de la contraction (bien que je dirais qu'il n'y a pas beaucoup de cas ambigu). Je voudrais utiliser quelque chose qui est plus lisible et plus facile à maintenir:

import re

def decontracted(phrase):
    # specific
    phrase = re.sub(r"won't", "will not", phrase)
    phrase = re.sub(r"can\'t", "can not", phrase)

    # general
    phrase = re.sub(r"n\'t", " not", phrase)
    phrase = re.sub(r"\'re", " are", phrase)
    phrase = re.sub(r"\'s", " is", phrase)
    phrase = re.sub(r"\'d", " would", phrase)
    phrase = re.sub(r"\'ll", " will", phrase)
    phrase = re.sub(r"\'t", " not", phrase)
    phrase = re.sub(r"\'ve", " have", phrase)
    phrase = re.sub(r"\'m", " am", phrase)
    return phrase


test = "Hey I'm Yann, how're you and how's it going ? That's interesting: I'd love to hear more about it."
print(decontracted(test))
# Hey I am Yann, how are you and how is it going ? That is interesting: I would love to hear more about it.

il y a peut-être des défauts auxquels je n'ai pas pensé.

repris de mon autre réponse

4
répondu Yann Dubois 2017-12-11 16:05:52

je voudrais ajouter peu à la réponse d'alko ici. Si vous vérifiez wikipedia, le nombre de contractions de langue anglaise comme mentionné il ya moins de 100. Accordée, dans le scénario réel, ce nombre pourrait être plus que cela. Mais encore, je suis assez sûr que 200-300 mots sont tout ce que vous aurez pour les mots de contraction anglais. Maintenant, Voulez-vous obtenir une bibliothèque séparée pour ceux (Je ne pense pas que ce que vous recherchez existe réellement, cependant)?. Cependant, vous pouvez facilement résoudre ce problème avec dictionnaire et utilisation de regex. Je recommande l'utilisation d'un tokenizer nice comme Natural Language Toolkit et le reste vous ne devriez pas avoir de problème dans la mise en œuvre vous-même.

3
répondu Jack_of_All_Trades 2013-11-05 13:47:49

c'est une bibliothèque très cool et facile à utiliser pour le but https://pypi.python.org/pypi/pycontractions/1.0.1 .

Exemple d'utilisation (détaillé dans le lien):

from pycontractions import Contractions

# Load your favorite word2vec model
cont = Contractions('GoogleNews-vectors-negative300.bin')

# optional, prevents loading on first expand_texts call
cont.load_models()

out = list(cont.expand_texts(["I'd like to know how I'd done that!",
                            "We're going to the zoo and I don't think I'll be home for dinner.",
                            "Theyre going to the zoo and she'll be home for dinner."], precise=True))
print(out)

vous aurez également besoin de GoogleNews-vectors-negative300.bin, lien pour télécharger dans le lien pycontractions ci-dessus. * Exemple de code en python3.

2
répondu Joe9008 2018-04-17 09:43:10

même si c'est une vieille question, j'ai pensé que je pourrais aussi bien répondre puisqu'il n'y a toujours pas de VRAIE solution à ce que je peux voir.

j'ai dû travailler là-dessus sur un projet relatif à la NLP et j'ai décidé de m'attaquer au problème puisqu'il ne semblait pas y avoir quoi que ce soit ici. Vous pouvez consulter mon expander github si vous êtes intéressé.

c'est un programme assez mal optimisé (je pense) basé sur NLTK, le Stanford Core modèles NLP, que vous devrez télécharger séparément, et le dictionnaire dans la réponse précédente . Toutes les informations nécessaires doivent être dans le README et le code abondamment commenté. Je sais que le code commenté est un code mort, mais c'est juste la façon dont j'écris pour garder les choses claires pour moi.

l'exemple d'entrée dans expander.py sont les phrases suivantes:

    ["I won't let you get away with that",  # won't ->  will not
    "I'm a bad person",  # 'm -> am
    "It's his cat anyway",  # 's -> is
    "It's not what you think",  # 's -> is
    "It's a man's world",  # 's -> is and 's possessive
    "Catherine's been thinking about it",  # 's -> has
    "It'll be done",  # 'll -> will
    "Who'd've thought!",  # 'd -> would, 've -> have
    "She said she'd go.",  # she'd -> she would
    "She said she'd gone.",  # she'd -> had
    "Y'all'd've a great time, wouldn't it be so cold!", # Y'all'd've -> You all would have, wouldn't -> would not
    " My name is Jack.",   # No replacements.
    "'Tis questionable whether Ma'am should be going.", # 'Tis -> it is, Ma'am -> madam
    "As history tells, 'twas the night before Christmas.", # 'Twas -> It was
    "Martha, Peter and Christine've been indulging in a menage-à-trois."] # 've -> have

dont la sortie est

    ["I will not let you get away with that",
    "I am a bad person",
    "It is his cat anyway",
    "It is not what you think",
    "It is a man's world",
    "Catherine has been thinking about it",
    "It will be done",
    "Who would have thought!",
    "She said she would go.",
    "She said she had gone.",
    "You all would have a great time, would not it be so cold!",
    "My name is Jack.",
    "It is questionable whether Madam should be going.",
    "As history tells, it was the night before Christmas.",
    "Martha, Peter and Christine have been indulging in a menage-à-trois."]

donc pour ce petit jeu de phrases de test, je suis venu avec pour tester quelques bord-cases, il fonctionne bien.

depuis que ce projet a perdu de son importance en ce moment, je ne suis plus en train de le développer activement. Toute aide sur ce projet serait appréciée. Les choses à faire sont écrites dans la liste des choses à faire. Ou si vous avez des conseils pour améliorer mon python, je vous en serais reconnaissant.

0
répondu Yannick 2017-10-18 10:23:02