Find The Needles: Python Version

Find The Needles: Python Version

Find The Needles: Python Version I took a stab at explaining a search algorithm written in python with some suggested improvements.

def findNeedles(haystack, needlesArr):
 if len(needlesArr) > 5:
 sys.stderr.write('Too many words!')
 else:
 countArray = [0]*len(needlesArr)
 for i in range(len(needlesArr)):
 words = re.split("[ \"\'\t\n\b\f\r]", haystack)
for j in range(len(words)):
 if words[j] == needlesArr[i]:
 countArray[i] += 1
for j in range(len(needlesArr)):
 print needlesArr[j] + ": " + str(countArray[j])
findNeedles("the quick red fox jumped over the lazy brown dog", ['the'])

This method is used to find 'needles' or words in a haystack or array. Tt takes two parameters:

  • haystack: the haystack or string to search for the needles or words
  • needlesArr: the words to search for in the haystack The algorithm is as follows:
  • If the length of needles array (list) is greater than 5, it should do a stdout of too many words.
  • If not, then initialize an array countArray to keep track of the word counts.
  • Loop through all numbers from 0 to the length of needlesArr; create the words variable from the regex split function
  • Split the string haystack on any whitespace
  • Then loop through from 0 to the length of words, if the item at index j is in the original needlesArr at the same index, increase the value in countArray by 1.
  • Finally, print that item with its corresponding count.

    Improvement Suggestions

  • The solution above is not very efficient as it would have It would be an O(n) big O time complexity. . It is possible to do this in a single pass with a dictionary as demonstrated in the following solution:
import re
def findNeedles(haystack, needlesArr):
 count = dict.fromkeys(needlesArr, 0)
for word in re.split("[ \"\'\t\n\b\f\r]", haystack):
 if word in count:
 count[word] += 1
print(count)
findNeedles("the quick red fox jumped over the lazy brown dog", ['the'])
  • The above solution will enable the lookups to have O(1) instead of O(n) time complexity as you don't have to compare each word to every single needle