I am attempting to parse strings using Regex. The strings look like:
Stack;O&verflow;i%s;the;best!
I want to parse it to:
Stack&verflow%sbest!
So when we see a ;
remove everything up until we see one of the following characters: [;,)%&@] (or replace with empty space "").
I am using re
package in Python:
string = re.sub('^[^-].*[)/]$', '', string)
This is what I have right now:
^[^;].*[;,)%&@]
Which as I understand it says: starting at the pattern with ;
, read everything that matches in between ;
and [;,)%&@] characters
But the result is wrong and looks like:
Stack;O&verflow;i%s;the;
What am I missing?
EDIT: @InSync pointed out that there is a discrepancy if ;
is in the end characters as well. As worded above, it should result inStack&verflow%s**;**best!
but instead I want to see Stack&verflow%sbest!
. Perhaps two regex lines are appropriate here, I am not sure; if you can get to Stack&verflow%s**;**best!
then the rest is just simple replacement of all the remaining ;
.
EDIT2: The code I found that works was
import redef remove_semicolons(name): name = re.sub(';.*?(?=[;,)%&@])', '', name) name = re.sub(';','',name) return nameremove_semicolons('Stack;O&verflow;i%s;the;best!')
Or if you feel like causing a headache to the next programmer who looks at your code:
import resemicolon_string = 'Stack;O&verflow;i%s;the;best!'cleaned_string = re.sub(';','',re.sub(';.*?(?=[;,)%&@])', '', semicolon_string))