Quantcast
Channel: c# Regex to remove single characters and orphaned spaces when input is unknown and can contain multiple words - Stack Overflow
Viewing all articles
Browse latest Browse all 3

c# Regex to remove single characters and orphaned spaces when input is unknown and can contain multiple words

$
0
0

This is almost similar to this OR condition in Regex and many others close ...

I have an OCR Program that is reading labels off of pictures some of the bits cause some small errors with single characters in odd places but all the labels will have at least 2 letters and any wrong letters will be space padded at least trailing maybe leading

GIVEN :

  • m Rose
  • a a m a this test b c z ^ @
  • k This Bigger k
  • Great m z
  • One Big Good Word This IS About AS LRG Possible and good one

DESIRED :

  • Rose
  • this test
  • This Bigger
  • Great
  • One Big Good Word This IS About AS LRG Possible and good one

How do I get rid of the odd ball singles in c# I have been trying for hours with single and multiple Regex.Replace but am getting nowhere

str = Regex.Replace(str2, @"([0-9a-zA-Z]{1}) ([0-9a-zA-Z]{2,100})?","$2", RegexOptions.Multiline);

gets close but truncates a letter and space between words so "Open Hours" is "OpeHours"happy to replace with spaces then another line to get rid of them ..just not getting the words multiple words out since the lengths and occurrences are random and my regex skill is average at best, just seems there should be a one liner for this without having to split and reassemble.

...after regex for a reason.. I know could loop through the string and look for spaces before and after or other string voodoo ways ...


Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles



Latest Images