• 0

[Java] Regex


Question

Hello,

I'm starting to learn Java and wanted to use regex. I've figured out the basics (matching, splitting etc). What I want to know is how to extract all items in a string that are in between "", for example if I have the string

ABCD afjaf "Catch1" asfiha "Catch"

I want to able to retrieve Catch1 and Catch....

Anyone know how this can be done?

Thanks

Link to comment
Share on other sites

6 answers to this question

Recommended Posts

  • 0

Have a look at capture groups, the groups are define in the regex as ( )

so something like this:

.*"(.*)".*"(.*)"

would work for the above example, you can then retrieve the 2 matches which will be Catch1 and Catch

calling .matcher(text) on your compiled pattern will return a matcher object which has groups sequentially numbered matcher.group(0), matcher.group(1) etc,

Edited by spike232
Link to comment
Share on other sites

  • 0

Hello, thanks for pointing me in the right direction, I think I am having now with regex itself.

What if I do not know the format/length of the string, and want to find all strings containted within "" quotations? shouldn't the pattern

"(.*)"

do that?

I tried it and I keep finding only 1 match.... and if I try using .group(0) I would get an error

Exception in thread "main" java.lang.IllegalStateException: No match found

although .groupCount() gives an answer of 1.

Link to comment
Share on other sites

  • 0

for something more general the .* is far too greedy, in a string like the above it would return everything from the first " to the last

Try something more specific like

"([a-z0-9]*)"

basically you need to specify all the accepted characters except for " so it will just match each pair (also unless you tell it to be case insensitive you will need to conclude upper case too)

Link to comment
Share on other sites

  • 0

Try something like:

[^"]+"([^"]+)"[^"]+"([^"]+)"

It will start from the beginning and continue until it matches a ". When it matches that, it'll put everything it matches inside a capture group, continuing until it matches another ". Then it'll do that again for the 2nd quoted text.

You might need to escape the " which is part of the regex.

Edit: Since I had Eclipse open.....:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class test {

	public static void main(String[] args) {

		String regexp = "[^\"]+\"([^\"]+)\"[^\"]+\"([^\"]+)\"";
		String text = "ABCD afjaf \"Catch1\" asfiha \"Catch\"";
		Pattern p = Pattern.compile(regexp);
		Matcher m = p.matcher(text);
		if (m.matches())
			System.out.println(m.group(1) + ", " + m.group(2));
	}
}

Prints: Catch1, Catch.

Edited by zer0day
Link to comment
Share on other sites

  • 0

Happy New Year!

Thanks +zer0day, your code works great, but after looking more into regex and groups, I know understand that each (.*) is group index... Which makes what I am trying to do impractical.

What I am trying to do is to extract all quotations (phrases surrounded by " " or ' ') from a paragraph. Since this number can vary from paragraph to paragraph using groups seems impractical because I do not know how many quotations there are (if any at all).

At first I thought using regex seems the most practical way, but it does not seem so likeyl... Does anyone have any suggestions on how I might extract the quotations?

Thanks

Link to comment
Share on other sites

  • 0

["']([^"']*)["']

That's probably the easiest regex I could come up with.

That will search a paragraph and extract all the words in "one of these" or 'even some of these'.

Test Case :

Lorem 'ipsum' dolor sit amet, consectetur adipiscing elit. Integer enim ipsum, pharetra hendrerit, bibendum eget, tristique ac, mi. Integer sed lacus sed pede dignissim porta. Sed at magna at nibh mollis sodales. Nullam "condimentum". Cras vel velit vel ligula volutpat condimentum. Integer ullamcorper. Nunc diam nunc, semper id, imperdiet vitae, 'dictum' id, lacus. Aenean lectus lectus, elementum at, lobortis vel, convallis eget, magna. Quisque sollicitudin eleifend quam. Praesent sodales facilisis orci. Quisque varius elementum justo. Donec non diam. Ut pharetra urna at dui. Etiam semper ornare "risus".

Results :

  1. ipsum
  2. condimentum
  3. dictum
  4. risus

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.