Find The Needles: Python Version I took a stab at explaining a search algorithm written in python with some suggested improvements.
def findNeedles(haystack, needlesArr):
if len(needlesArr) > 5:
sys.stderr.write('Too many words!')
else:
countArray = [0]*len(needlesArr)
for i in range(len(needlesArr)):
words = re.split("[ \"\'\t\n\b\f\r]", haystack)
for j in range(len(words)):
if words[j] == needlesArr[i]:
countArray[i] += 1
for j in range(len(needlesArr)):
print needlesArr[j] + ": " + str(countArray[j])
findNeedles("the quick red fox jumped over the lazy brown dog", ['the'])
This method is used to find 'needles' or words in a haystack or array. Tt takes two parameters:
- haystack: the haystack or string to search for the needles or words
- needlesArr: the words to search for in the haystack The algorithm is as follows:
- If the length of needles array (list) is greater than 5, it should do a stdout of
too many words
. - If not, then initialize an array
countArray
to keep track of the word counts. - Loop through all numbers from 0 to the length of
needlesArr
; create thewords
variable from the regex split function - Split the string haystack on any whitespace
- Then loop through from 0 to the length of
words
, if the item at indexj
is in the originalneedlesArr
at the same index, increase the value incountArray
by 1. - Finally, print that item with its corresponding count.
Improvement Suggestions
- The solution above is not very efficient as it would have It would be an
O(n)
big O time complexity. . It is possible to do this in a single pass with a dictionary as demonstrated in the following solution:
import re
def findNeedles(haystack, needlesArr):
count = dict.fromkeys(needlesArr, 0)
for word in re.split("[ \"\'\t\n\b\f\r]", haystack):
if word in count:
count[word] += 1
print(count)
findNeedles("the quick red fox jumped over the lazy brown dog", ['the'])
- The above solution will enable the lookups to have
O(1)
instead ofO(n)
time complexity as you don't have to compare each word to every single needle