• 0

Need help with RegEx


Question

Hi,

 

I have a set of sentences and need to weed out the bad ones and then extract the left part (the sentence) and the right part (number of minutes).

 

A is on phone from last 15min
B is on phone with A from last  15min
D is ready to talk to C now
D is 1min closer to talk to C now
E is waiting for D forever
F is waiting for G to pick up phone from last -12min
F is waiting for G to pick up phone from last 42min
F is waiting for G to pick up phone from last min
F is waiting for G to pick up phone from last -12 min
F is waiting for G to pick up phone from last 0 min
G will speak to F later

 

Rules

  • The sentence must end with either the number of minutes (i.e. "42min") or the word "now"
  • There must be at least 1 space before the number of minutes (i.e. "WHAT EVER THE SENTENCE IS 42min")
  • There cannot be any space between the the number of minutes and the word "min" (i.e "42min" is ok, "42 min" is not ok)
  • The number of minutes must be more than 5
  • Apart from the "42min" at the end, the rest of the sentence should not contain any numbers

So from the above set,

 

YUPS : A is on phone from last 15min

YUPS : B is on phone with A from last  15min

YUPS : D is ready to talk to C now
NOPE : D is 1min closer to talk to C now

NOPE : E is waiting for D forever

NOPE : F is waiting for G to pick up phone from last -12min

NOPE : F is waiting for G to pick up phone from last 42 min

NOPE : F is waiting for G to pick up phone from last min

NOPE : F is waiting for G to pick up phone from last -12 min

NOPE : F is waiting for G to pick up phone from last 0min

NOPE : F is waiting for G to pick up phone from last 0 min

NOPE : G will speak to F later

 

This is what I have come up with but it is not giving me the desired result and I am stuck. Any help is appreciated.

 

http://www.regexr.com/39dj6

 

Cheers :)

Link to comment
Share on other sites

13 answers to this question

Recommended Posts

  • 0

Geeeez...

Okay, so first of all understand that regex works on characters in a string, and each piece of your regex is attempting to match one or more characters in the string. I'm saying this because I don't understand why you think it's right to stick a \w, which looks for a character permissible in a "word" (permissible characters including letters, numerical digits and underscore), right before the fixed letters 'now' or 'min'. You can see that this is not working because it's not picking up those sentences ending in 'now'. To be clear, this is because there is always a space character between the letters 'now' and the "word" character before it. The space does not match the permissible characters of a "word".

Additionally:

  • Square brackets allow you to specify one or more permissible or unacceptable characters to match against. So for example if you wanted to allow letters a to e, you could have [abcde] or [a-e]. You can negate this to say anything other than these characters by starting it with a caret (^) character: [^a-e], which would match any symbol other than a-e. If you wanted to use the caret character itself in such a set of characters you can escape it with a backslash before it, and thus because the backslash is special, for changing the meaning of something coming after it, you must use it to escape the backslash itself if you want to use the backslash as a character instead of an escape tool. Another example, [a-e0-3_] matches any letter of a-e or digit 0 to 3 or an underscore.
  • You can place symbols such as ? or * or + after things to control how many times you want it to match. ? means zero or one, * means zero or more, and + means one or more. You can use a set of curly braces to describe a specific minimum and maximum, e.g. {1,3} means between one and three instances. You can place a single number within curly braces, which means both min and max are this number, in other words there must be this precise number of instances.
  • If you want to control the number of instances of a sequence of things, wrap that sequence within brackets and place the instance control after it. E.g. /(abc )+/ would match "abc abc abc ". However, anything matched by the bit within a pair of brackets it will remember, which allows individual portions of the matched string to be captured and made available to you after the regex process has completed processing the expression against the string, and it also allows you to refer to a matched piece of the string elsewhere in your regex as a variable, for example /^([abc] )$1*$/ against the string "b b b b " would match, whereas "b b a b " would not. You can prevent a pair of brackets from remembering by including '?:' at the start of its contents.
  • You shouldn't be using \s to match the space character. \s matches whitespace, which could also match a tab for instance, which is not what you said you wanted. What you should use is an actual space character. To make it absolutely clear that you really meant to put a space in the regex, you could place it within square brackets [ ].

As an example to help you truly understand, try this against your example: /(\w+ )+(\w+)?/
This looks for one or more (+) "word" characters (\w, where letters numerical digits and underscores are permissible), followed by a space. There must be one or more blocks of that. Then there may be zero or one (?) of one or more "word" characters, this time not followed by a space. You could also tack on the $ symbol on the end to say end of string, to prevent it matching if anything else unmatched is on the end, and similarly the caret (^) symbol on the front to say start of string.
This matches everything except the portions of examples with a minus (-) symbol, which is not a permissible "word" symbol and not otherwise explicitly permitted to match with our regex. We could allow the final word sequence to begin with a - character with the following: /(\w+ )+([-]?\w+)?/ or allow it at the start of any word sequence in the sentence with: /([-]?\w+ )+([-]?\w+)?/

Matching a number that follows your requirement of a value of more than 5 might be more complex than you imagine, because regex works on patterns of characters, it doesn't extract and evaluate numbers to understand their value.

Let's try to build this...

  • Match a number of value greater than 5 (this also ensures no string of zeros proceeding a lower value will match):
    0*([6-9]|[1-9][\d]+)
  • Require 'min' after it:
    0*([6-9]|[1-9][\d]+)min
  • Require a space before it:
    [ ]0*([6-9]|[1-9][\d]+)min
  • Require that it is the last thing within the string :
    [ ]0*([6-9]|[1-9][\d]+)min$
  • Allow the word 'now' instead:
    [ ](0*([6-9]|[1-9][\d]+)min|now)$
  • Match a "word" sequence before it:
    (\w+)[ ](0*([6-9]|[1-9][\d]+)min|now)$
  • Require instead that the word sequence cannot contain numerical digits, only letters:
    ([a-zA-Z]+)[ ](0*([6-9]|[1-9][\d]+)min|now)$
  • Capture all word sequences proceeding Xmin or now, requiring that there be at least one:
    ([a-zA-Z]+[ ])+(0*([6-9]|[1-9][\d]+)min|now)$
  • Require that this cover the entire line:
    ^([a-zA-Z]+[ ])+(0*([6-9]|[1-9][\d]+)min|now)$
  • Improve capturing: Prevent capturing for blocks that we don't need to be remembered (for efficiency); and avoid unnecessary capture of the space at the end of the sentence:
    ^((?:[a-zA-Z]+[ ])*[a-zA-Z]+)[ ](?:0*([6-9]|[1-9][\d]+)min|now)$

Applying this to each individual sentence, I'm not certain how exactly vb.net is going to returnt he results to you, but if this were PHP using its preg_match() function with the matches parameter, you'd get an array which would contain the full string matched within the first array element, then the proceeding array elements would contain the portion captured with each capture block. In this case the first of these would be the sentence before 'now or Xmin', excluding the space between them, and then for those lines with Xmin instead of now, there will be a second for the number of minutes, which I would expect you may wish to convert from a string to an integer, which you could do with a cast: $value = (int) $string;

Edit: replaced [5-9] with [6-9] - I thought you said five or more, but you actually said more than five!
Edit: various tiny tweaks to text discussion (not to regex) and replaced [0-9] with the slightly more simple [\d] in regex.
Edit: removed two unnecessary pairs of brackets in regex
Edit: removed delimiters, seems to result in slightly better syntax highlighting here without them
Edit: Zeros were allowed at beginning of number, except 06, 07, 08 or 09. Fixed. 0*[6-9]. Fixed

Edit: Replaced a [\d]* that was essentially only to allow one or more zeros with the simpler form of 0*. Also switch the number checking around to do 0*[6-9] first rather than after the longer number check.

Edit: Remove unnecessary repeat of 0* check, and moved it outside of capture block so that leading zeros are no longer part of the number that is captured and returned.

Link to comment
Share on other sites

  • 0

Ok so I had a help from someone who came up with a solution. Would like to see if there is any other ways of doing this?

 

(^[^\d\r\n]+?)( +)(((([6789])|(\d{2,}))min\r?$)|(now\r?$))

 

I have updated the regexr site snippet.

Link to comment
Share on other sites

  • 0

Ok so I had a help from someone who came up with a solution. Would like to see if there is any other ways of doing this?

(^[^\d\r\n]+?)( +)(((([6789])|(\d{2,}))min\r?$)|(now\r?$))

I have updated the regexr site snippet.

 

This seems no good to me...

 

First let me check with you a few things that I might need to improve in mine, based on reading yours

  1. In yours you're only allowing single or double digit numbers. You said nothing about such a restriction. Can you be more clear please on what the restrictions are here. Can it be only one or two digits, or more? What about maximum value, assuming two digits max, is it permitted up to 99 or only to 59 or 60??????
  2. I made an assumption that the sentence proceeding this should only be made up of letters, with a single space between words. Is this correct, or are other symbols allowed also? You said no digits, and I assumed therefore only letters based on your example, but what about the rest, e.g. _.?@#=+-"!? etc, etc...
  3. Are end of line characters to be expected at the end of the string? You didn't say that they might be present, and I quite reasonably assumed not. Are you really REALLY sure they might be there?

Issues I have with yours:

  • It restricts single digit numbers correctly, however not multi-digit numbers. 00 for example is allowed which violates your greater than 5 rule.
  • You seem to have a bunch of unnecessary brackets in yours, and no limiting over capturing (inefficient).
  • You're unnecessary repeating \r$ towards the end.
  • You're checking for \r at end. I've just asked above whether this is correct, but if it is, why are you only checking for \r, what about \n and \r\n?
  • You're allowing multiple spaces between the sentence and the time, why? That wasn't explicitly allowed in your requirements!
  • The reference to the beginning of the string/line does not need to be within that first capture block and I would argue it's bad practice to do so unless necessary.
  • You said that the sentence should not contain numbers, and as I said above, I made an assumption that you wanted words only made of letters, separated by a single space, and that's what mine is giving you. The portion of yours that checks the sentence portion just rejects \r \n and digits, and permits and captures everything else. What about other symbols and whitespace characters (like tabs)? These are permitted by yours! Should they be? I mean I could stick two spaces and nothing else before a time and yours would match it!

Checkout some additional example lines which I don;t think you'd want to be matched but are with your regex:

http://regexr.com/39dre

 

Compare this to my one:

http://regexr.com/39dpc

 

Do by all means get back to me on the questions I asked above though and I'll tweak it a little more to perfect it for you if necessary wink.png

 

Edit: Also, do you require it to capture the word 'now', or are you goingt o be okay without that? This will depend on how you're programming it. If it matches the string, and you only get back one piece of captured information (the sentence) aside from the full string matched, is that enough, I mean the absence of a time value in the case of it being 'now', or do you really need the string 'now' to be captured and returned?

 

Edit: fixed typos

Edit: replaced link to mine to one with slightly tweaked regex from my previous post

Edit: removed non-ascii pound symbol (?) from extra examples I added to the regexr.com tests

Link to comment
Share on other sites

  • 0

Oh wow! Please use POSIX character classes to make it more readable.

POSIX character classes is something I either wasn't aware of or I'd forgotten about (I haven't done much regex for a while).

I don't know whether vb.net support them, but here you go:

^((?:[:alpha:]+[ ])*[:alpha:]+)[ ](?:0*([6-9]|[1-9][:digit:]+)min|now)$

Edit: regexr.com doesn't seem to like that hmm.gif

Edit: Replaced a [\d]* that was essentially only to allow one or more zeros with the simpler form of 0*. Also switch the number checking around to do 0*[6-9] first rather than after the longer number check.

Edit: Remove unnecessary repeat of 0* check, and moved it outside of capture block so that leading zeros are no longer part of the number that is captured and returned.

Link to comment
Share on other sites

  • 0

Wow thanks theblazingangel. Super help there.

 

I feel like a total newbie. In response to your questions,

 

1. Yes any number larger than 5 is allowed

2. Apart from the Xmin or now the rest of the sentence cannot contain any number, any other character is allowed

3. Each line will be processed individually so there won't be any line ending characters as it will be trimmed

 

I assume to fit the "#2" one would negate the characters [^0-9] for the sentence matching part. I tried retrofitting that but it didn't work so I missed something there obviously.

 

Cheers :)

Link to comment
Share on other sites

  • 0

Wow thanks theblazingangel. Super help there.

 

I feel like a total newbie. In response to your questions,

 

1. Yes any number larger than 5 is allowed

2. Apart from the Xmin or now the rest of the sentence cannot contain any number, any other character is allowed

3. Each line will be processed individually so there won't be any line ending characters as it will be trimmed

 

I assume to fit the "#2" one would negate the characters [^0-9] for the sentence matching part. I tried retrofitting that but it didn't work so I missed something there obviously.

 

Cheers smile.png

 

Ok, so first of all I just want to point out that I've made an additional couple of tiny tweaks to my posts above since you replied.

 

Making this and the use of this absolutely perfect, there are a few things we need to discuss in some detail.

 

Double escaping

 

I expect you already understand the use of the backslash in strings as an escape character to change the meaning of the character after it. For example if you wanted a " in a string enclosed within a pair of "" you need to write this as "\"", similarly to express a return or newline character "\r\n". For some, perhaps most languages you can alternatively use single quotes around the string to prevent it from interpreting such sequences as anything but text, though you have to avoid a ' itself being in such a string, or perhaps you can escape just that one character.

 

Expressing your regex as a string in a programming language, which you feed to a regex function, you need to be careful because you've got two points at which escape sequences will be processed (if you express within a pair of "" instead of '' at least). Firstly at the programming language level, during compilation (and dynamically in an IDE as you type to look for syntax errors), escape sequence processing will be performed upon the string you've written, with the result being compiled into the program. Then secondly, that result is what will be passed to the regex engine, which will look for further escape sequence patterns. Any escaping done in the regex pattern must then be doubly escaped when the pattern is placed within a string, to ensure that the regex engine actually gets the correct pattern.

 

E.g., where you want to match an actual backslash (\) character with regex, you must escape it as \\, but then with that regex is placed within a string (enclosed in a pair of ""), this must become "\\\\" to ensure that the regex pattern given tot he regex engine is \\, e.g. $regex = "/^[^\\\\]+$/" (expressed in PHP) will match a string containing one or more characters that are anything other than a backslash.

 

In some programming languages, such as .net based ones you may have something called a "verbatim" string, where you place an @ character at the start, before the enclosing ", which tells the compiler not to process escape sequences, and thus you do not need to do any double escaping. So "\\\\" can instead be written as @"\\".

 

Integer sizes

 

So you've indicated that you want to be flexible and allow numbers of any length as long as the value is greater than five. I presume that once you've extracted such a number, you're going to want to convert that from a string to an integer. Even if you're a beginner at programming, you should already know (since it's one of the first things you should learn) that you have a small collection of different integers (short, int, long, etc), with each having a minimum and maximum value that they can hold.

 

Be aware that the numerical value expressed as a string may be too large to convert to an integer, and so make sure that you've got error checking in place to allow for cases where that may fail.

 

FAIL: A is on phone from last 4294967296min //conversion to unsigned 32-bit integer should fail, it's a value of 1 larger than the max

PASS: A is on phone from last 00000004294967295min //exactly equal to max for unsigned 32-bit integer, so should convert successfully, leading zeros are definitely ignored because they are not returned from the (updated) regex I wrote.

 

Allowed characters in first part of sentence

 

So currently my regex enforces that the first part of the sentence (everything except the Xmin or word now, and the space before it) is made up of:

  • Sequences of characters that are only in the range a-z or A-Z
  • Each of these sequences in separated by a single space

First question, is the single space enforcement acceptable? Do you want to allow more than one? Do you want to allow tabs, not just spaces?

 

Now, you've just said that you want to allow more characters than just basic (English) alphabetical characters. You tried to achieve this by using [^0-9] instead of [a-zA-Z], but there's a big problem with that (besides the 250ms timeout error regexr.com gives trying to process it):

  • Firstly this allows matching spaces and tabs, so it'll match the third invalid example I added to your list (the "  8mins").
  • You might have incorrectly assumed that [^0-9] would only allow punctuation and alphabetical characters, however, besides space and tab just mentioned, the basic ASCII character encoding contains a total of 128 symbols, 33 of which special control characters, including carriage return (\r), newline (\n), "form feed", "negative acknowledgement", "end of transmission", etc, which you may not want to allow.

If you want to explicitly exclude those special control characters, and thus just accept standard punctuation in addition to alphabetical letters, it would be better if we instead explicitly expressed the full set of what we want to allow. Now if POSIX character classes are supported in the regex engine used by the programming language you want to use this in, then we could do that fairly simply by saying [[:alpha:][:punct:]] but otherwise we need to express it explicitly. I'm not certain whether if you only need one of these if you can do just [:alpha:] or whether you need to do [[:alpha:]].

 

Note that the set of permissible or non-permissible characters expressed inside of a pair of square brackets is called a "character class". Certain symbols with special meaning, such as *, do not need escaping when inside of a character class, because they are treated differently. The only characters with special meaning within a character class are caret (^) closing square bracket (]) and dash (-). Furthermore these three characters within a character class may or may not be interpreted as having special meaning, depending on where they occur in the character class. E.g. in [a-z] the dash means a range, in this case letters from a to z, but [az-] means letter a or letter z or a dash, we could escape the former as [a\-z] to mean the same thing, no escaping is needed for the latter. [^0-9] means anything other than 0-9, but [0-9^] means digit 0-9 or caret (^), the former would need escaping to mean the same thing [\^0-9], but no escaping is needed in the latter. The closing square bracket is special and needs escaping, except if it is the very first character in the character class, because an empty character class is meaningless, so []a-z] means a closing square bracket or letter a-z, no escaping needed, but [a-z]] needs escaping as [a-z\]].

 

Update: according to wikipedia, the rules under the posix standard might be a little more strict than I thought. It states that the dash must be the last character if it is to be treated as just a character, and the opening bracket must be the first. Furthermore escaping should not be permitted or even necessary with in a character class, so a single backslash should be interpreted as just a character. These rules are not properly being enforced by regexr.com which is actually using the javascript regex engine built into your web browser and doesn;t seem to be fully posix compliant; you have to escape the opening square bracket and the backslash. I've updated the examples below with three options, one posix standard, one posix standard with posix shorthands for common character groups, and one with a format based on what regexr.com requires (though it doesn't actually seem to support things like subroutines and unicode that I discuss below.

 

So following those rules, and assuming POSIX character classes are not supported, we could use:

// Posix Standard + Posix Character Class Shorthand Descriptors
^((?:[[:alpha:][:punct:]]+[ ])*[[:alpha:][:punct:]]+)[ ](?:0*([6-9]|[1-9][:digit:]+)min|now)$
// Posix Standard
^((?:[][a-zA-Z!"#$%&'()*+,./:;<=>?@\^_`{|}~-]+[ ])*[][a-zA-Z!"#$%&'()*+,./:;<=>?@\^_`{|}~-]+)[ ](?:0*([6-9]|[1-9][\d]+)min|now)$
// For regexr.com (javascript regex engine)
^((?:[\][a-zA-Z!"#$%&'()*+,./:;<=>?@\\^_`{|}~-]+[ ])*[\][a-zA-Z!"#$%&'()*+,./:;<=>?@\\^_`{|}~-]+)[ ](?:0*([6-9]|[1-9][\d]+)min|now)$

To simplify this slightly, (we're repeating this long sequence of permitted characters twice because we want to capture all spaces between words, but not the space after the final one), we can can use something called subroutines. A set of rounded brackets () used to enclose something is called a capture block. As I described in a previous post we can enable or disable the capture block from remembering what that portion of the expression matches against (controls what bits of the string a regex matching function throws back at us). Also, we can refer to what was matched by any given capture block elsewhere in the expression as a variable, allowing use to find and match repeating sequences. Furthermore, we can refer to the expression within a given capture block from another capture block, to say do the same expression here, saving us from typing it in again and simplifying the overall expression. We can refer to other capture blocks in such a way using numbers, offsets, or names, the latter requires setting a name on the capture block you wish to refer to.

 

So with this, we can simplify our expression using something like the following:

// Posix Standard + Posix Character Class Shorthand Descriptors
^((?:(?'text':[[:alpha:][:punct:]])+[ ])*(?&text:)+)[ ](?:0*([6-9]|[1-9][:digit:]+)min|now)$
// Posix Standard
^((?:(?'text':[][a-zA-Z!"#$%&'()*+,./:;<=>?@\^_`{|}~-])+[ ])*(?&text:)+)[ ](?:0*([6-9]|[1-9][\d]+)min|now)$
// For regexr.com (javascript regex engine), though subroutines not supported, so this won't actually work
^((?:(?'text':[\][a-zA-Z!"#$%&'()*+,./:;<=>?@\\^_`{|}~-])+[ ])*(?&text:)+)[ ](?:0*([6-9]|[1-9][\d]+)min|now)$

Or if you're not allowed to use the colon character on the end (I'm hoping using it allows capturing for what we need it for here, but disable throwing these bits back to us in the results):

// Posix Standard + Posix Character Class Shorthand Descriptors
^((?:(?'text'[[:alpha:][:punct:]])+[ ])*(?&text)+)[ ](?:0*([6-9]|[1-9][:digit:]+)min|now)$
// Posix Standard
^((?:(?'text'[][a-zA-Z!"#$%&'()*+,./:;<=>?@\^_`{|}~-])+[ ])*(?&text)+)[ ](?:0*([6-9]|[1-9][\d]+)min|now)$
// For regexr.com (javascript regex engine), though subroutines not supported, so this won't actually work
^((?:(?'text'[\][a-zA-Z!"#$%&'()*+,./:;<=>?@\\^_`{|}~-])+[ ])*(?&text)+)[ ](?:0*([6-9]|[1-9][\d]+)min|now)$

Note that different regex engines and different versions may or may not support this, and may vary on the syntax. I can't seem to get the website we're using for testing to accept any such form of this hmm.gif Perhaps you'll have better luck with vb.net's engine.

 

Don't forget to use these in a verbatim string, or otherwise apply any necessary double escaping!

 

Beyond the basic ASCII character set - Unicode

 

One symbol I noticed (being British) that isn't in the ASCII character set is the pound sterling symbol (?). Another is the euro symbol. As such, sentences containing such symbols will fail to match. Furthermore any letter beyond the basic english a-zA-Z, such as those with accents will fail to match and thus be unsupported by your expression. Furthermore, trying to simply add the pound (?) symbol as acceptable, breaks the regex expression (at least on that website we're testing it on).

 

Do you want to expand the character set that you're allowing? If so then we need to delve into UTF, which will require a regex engine that supports it, which will probably require you supply an extra switch/flag to enable it. Then we need to change the regex to something like the following:

// Posix Standard
^((?:(?:\p{L}\p{M}*+|\p{S}|\p{P})+[ ])*(?:\p{L}\p{M}*+|\p{S}|\p{P})+)[ ](?:0*([6-9]|[1-9][\d]+)min|now)$

or if we can get named subroutines working, something like the following:

// Posix Standard
^((?:(?'text'\p{L}\p{M}*+|\p{S}|\p{P})+[ ])*(?&text)+)[ ](?:0*([6-9]|[1-9][\d]+)min|now)$

http://www.regular-expressions.info/unicode.html

 

The website we're using doesn't seem to support unicode. I'd search for an alternative, but I've got to go eat before my pizza burns in the oven, and I need to get on with my own stuff tongue.png

 

Edit: fixed broken code block and small typos

Edit: updated based on noticing that the posix standard might actually be a little stricter in regards to character class restrictions than I thought.

Link to comment
Share on other sites

  • 0

Thanks again theblazingangel. Going beyond and above to help others makes this community amazing. Really appreciate your help there.

 

Yeah I noticed the timeout too so what I did was, I scanned through the input data I had and then added extra characters that I found in the data apart from a-z. So that solved it. Yeah there can be multiple spaces but the sample input data I have does not have that so I am not going to worry about it for now. As it stands I have got this working so I am very happy with this.

 

I was aware of string escaping and such as I am not new to programming but my programming experience contradicts my RegEx experience :D

 

Again appreciate your help.

 

P.S. : Hope the pizza turned out perfect.

Link to comment
Share on other sites

  • 0

Thanks again theblazingangel. Going beyond and above to help others makes this community amazing. Really appreciate your help there.

 

Yeah I noticed the timeout too so what I did was, I scanned through the input data I had and then added extra characters that I found in the data apart from a-z. So that solved it. Yeah there can be multiple spaces but the sample input data I have does not have that so I am not going to worry about it for now. As it stands I have got this working so I am very happy with this.

 

I was aware of string escaping and such as I am not new to programming but my programming experience contradicts my RegEx experience biggrin.png

 

Again appreciate your help.

 

P.S. : Hope the pizza turned out perfect.

 

No problem. It did tongue.png

 

Not sure exactly what you mean about scanning through and adding extra characters to the input data and that solving the timeout. I did notice that if you reduce the number of example sentences for it to check in one go, that's another way to avoid the timeout though.

 

So if you want to allow multiple spaces, just replace the two instances of [ ] with [ ]+, or if you want to allow tabs as well instead of just spaces you can replace with [ \t]+ or [\s]+

 

Also, please remember to mark solved question thread in tech areas as such, it's not a big deal here, but it at least might help highlight to future forum visitors threads in which they might be more likely to find useful answers/info to their own problems :)

Link to comment
Share on other sites

  • 0

Update:

  • I updated the last big post of mine above because I became aware that the posix standard might actually be a little stricter in regards to character class restrictions than I thought.
  • I realised that regexr.com is using the javascript regex engine built into your browser, which doesn't seem to be fully posix compliant, at least not here for me on Firefox. I've updated the examples above where applicable to give alternate versions, ones for regexr.com and ones for use with posix standards compliant regex engines.
  • Like 1
Link to comment
Share on other sites

  • 0

Thanks. Appreciate your help.

 

P.S. What I meant by scanning the sample input I had was to check what other characters apart from a-z it had and add them individually rather than doing a negate [^0-9].

Link to comment
Share on other sites

This topic is now closed to further replies.