Quantcast
Channel: Recent Questions - Stack Overflow
Viewing all articles
Browse latest Browse all 12111

In PySpark, how do you properly split strings based on multiple delimiters?

$
0
0

I am trying to find count of words in PySpark. I am trying to split the words from the input with a list of delimiters. Here is the code that I have.

while(i < 18):        splitted_words = splitted_words.flatMap(lambda line: line.split(delimiters[i]))        i += 1

I am attempting to iterate through all of the delimiters and split the words using each of them in the loop. Whenever I print out the output, the delimiters are still there such as "_" or "(". Is there a better way of doing this.


Viewing all articles
Browse latest Browse all 12111

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>