Trouver toutes les occurrences D'un substrat en Python

Question

Trouver toutes les occurrences D'un substrat en Python

Python a string.find() et string.rfind() pour obtenir l'index d'une chaîne de caractères.

je me demande, peut-être qu'il y a quelque chose comme string.find_all() qui peut retourner tous les index fondés (pas seulement le premier du début ou le premier de la fin)?

par exemple:

string = "test test test test"

print string.find('test') # 0
print string.rfind('test') # 15

#that's the goal
print string.find_all('test') # [0,5,10,15]

244

python regex string

demandé sur nihiser 2011-01-12 05:35:18

15 réponses

score 383 · Answer 1

il n'y a pas de simple fonction de chaîne intégrée qui fait ce que vous cherchez, mais vous pouvez utiliser les plus puissants expressions régulières :

import re
[m.start() for m in re.finditer('test', 'test test test test')]
#[0, 5, 10, 15]

Si vous voulez trouver chevauchement des allumettes", 1519110920" d'anticipation qui fera:

[m.start() for m in re.finditer('(?=tt)', 'ttt')]
#[0, 1]

Si vous voulez une inversion de trouver, sans chevauchements, vous pouvez combiner positifs et négatifs d'anticipation dans une expression comme:

search = 'tt'
[m.start() for m in re.finditer('(?=%s)(?!.{1,%d}%s)' % (search, len(search)-1, search), 'ttt')]
#[1]

re.finditer retourne un générateur , de sorte que vous pourriez changer le [] dans le ci-dessus à () pour obtenir un générateur au lieu d'une liste qui sera plus efficace si vous itérez seulement à travers les résultats une fois.

score 79 · Answer 2

>>> help(str.find)
Help on method_descriptor:

find(...)
    S.find(sub [,start [,end]]) -> int

Ainsi, nous pouvons le construire nous-mêmes:

def find_all(a_str, sub):
    start = 0
    while True:
        start = a_str.find(sub, start)
        if start == -1: return
        yield start
        start += len(sub) # use start += 1 to find overlapping matches

list(find_all('spam spam spam spam', 'spam')) # [0, 5, 10, 15]

Pas de chaînes temporaires ou regexes nécessaire.

score 35 · Answer 3

Voici une façon (très inefficace) d'obtenir all (i.e. même Chevauchement) correspondances:

>>> string = "test test test test"
>>> [i for i in range(len(string)) if string.startswith('test', i)]
[0, 5, 10, 15]

score 17 · Answer 4

vous pouvez utiliser re.finditer() pour les allumettes qui ne se chevauchent pas.

>>> import re
>>> aString = 'this is a string where the substring "is" is repeated several times'
>>> print [(a.start(), a.end()) for a in list(re.finditer('is', aString))]
[(2, 4), (5, 7), (38, 40), (42, 44)]

mais ne sera pas de travail pour:

In [1]: aString="ababa"

In [2]: print [(a.start(), a.end()) for a in list(re.finditer('aba', aString))]
Output: [(0, 3)]

score 15 · Answer 5

Venez, laissez-nous répéter ensemble.

def locations_of_substring(string, substring):
    """Return a list of locations of a substring."""

    substring_length = len(substring)    
    def recurse(locations_found, start):
        location = string.find(substring, start)
        if location != -1:
            return recurse(locations_found + [location], location+substring_length)
        else:
            return locations_found

    return recurse([], 0)

print(locations_of_substring('this is a test for finding this and this', 'this'))
# prints [0, 27, 36]

pas besoin d'expressions régulières de cette façon.

score 14 · Answer 6

encore une fois, vieux fil, Mais voici ma solution en utilisant un générateur et simple str.find .

def findall(p, s):
    '''Yields all the positions of
    the pattern p in the string s.'''
    i = s.find(p)
    while i != -1:
        yield i
        i = s.find(p, i+1)

exemple

x = 'banananassantana'
[(i, x[i:i+2]) for i in findall('na', x)]

retourne

[(2, 'na'), (4, 'na'), (6, 'na'), (14, 'na')]

score 7 · Answer 7

si vous êtes à la recherche d'un seul personnage, cela fonctionnerait:

string = "dooobiedoobiedoobie"
match = 'o'
reduce(lambda count, char: count + 1 if char == match else count, string, 0)
# produces 7

aussi,

string = "test test test test"
match = "test"
len(string.split(match)) - 1
# produces 4

mon intuition est qu'aucun de ces deux (surtout #2) n'est terriblement performant.

score 7 · Answer 8

c'est un vieux thread mais je suis intéressé et je voulais partager ma solution.

def find_all(a_string, sub):
    result = []
    k = 0
    while k < len(a_string):
        k = a_string.find(sub, k)
        if k == -1:
            return result
        else:
            result.append(k)
            k += 1 #change to k += len(sub) to not search overlapping results
    return result

il doit retourner une liste des positions où le substrat a été trouvé. Veuillez commenter si vous voyez une erreur ou une possibilité d'amélioration.

score 4 · Answer 9

ce fil est un peu vieux mais cela a fonctionné pour moi:

numberString = "onetwothreefourfivesixseveneightninefiveten"
testString = "five"

marker = 0
while marker < len(numberString):
    try:
        print(numberString.index("five",marker))
        marker = numberString.index("five", marker) + 1
    except ValueError:
        print("String not found")
        marker = len(numberString)

score 1 · Answer 10

quelles que soient les solutions proposées par d'autres sont entièrement basées sur la méthode disponible find() ou toute autre méthode disponible.

Quel est l'algorithme de base pour trouver toutes les occurrences d'un sous-chaîne dans une chaîne de caractères?

def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

vous pouvez également hériter de la classe str à la nouvelle classe et pouvez utiliser cette fonction dessous.

class newstr(str):
def find_all(string,substring):
    """
    Function: Returning all the index of substring in a string
    Arguments: String and the search string
    Return:Returning a list
    """
    length = len(substring)
    c=0
    indexes = []
    while c < len(string):
        if string[c:c+length] == substring:
            indexes.append(c)
        c=c+1
    return indexes

appelle la méthode

newstr.find_all ('trouvez-vous cette réponse utile? puis upvote ce!','ce')

score 1 · Answer 11

, Vous pouvez essayer :

>>> string = "test test test test"
>>> for index,value in enumerate(string):
    if string[index:index+(len("test"))] == "test":
        print index

0
5
10
15

score 1 · Answer 12

Cela fait le tour pour moi en utilisant re.finditer

import re

text = 'This is sample text to test if this pythonic '\
       'program can serve as an indexing platform for '\
       'finding words in a paragraph. It can give '\
       'values as to where the word is located with the '\
       'different examples as stated'

#  find all occurances of the word 'as' in the above text

find_the_word = re.finditer('as', text)

for match in find_the_word:
    print('start {}, end {}, search string \'{}\''.
          format(match.start(), match.end(), match.group()))

score 0 · Answer 13

lorsque vous recherchez une grande quantité de mots clés dans un document, utilisez flashtext

from flashtext import KeywordProcessor
words = ['test', 'exam', 'quiz']
txt = 'this is a test'
kwp = KeywordProcessor()
kwp.add_keywords_from_list(words)
result = kwp.extract_keywords(txt, span_info=True)

Flashtext court plus rapidement que regex sur une grande liste de mots recherchés.

score -1 · Answer 14

La pythonic serait:

mystring = 'Hello World, this should work!'
find_all = lambda c,s: [x for x in range(c.find(s), len(c)) if c[x] == s]

# s represents the search string
# c represents the character string

find_all(mystring,'o')    # will return all positions of 'o'

[4, 7, 20, 26] 
>>>

score -2 · Answer 15

, veuillez regarder ci-dessous le code

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''


def get_substring_indices(text, s):
    result = [i for i in range(len(text)) if text.startswith(s, i)]
    return result


if __name__ == '__main__':
    text = "How much wood would a wood chuck chuck if a wood chuck could chuck wood?"
    s = 'wood'
    print get_substring_indices(text, s)

Las etiquetas más populares

Trouver toutes les occurrences D'un substrat en Python

15 réponses

exemple