Sir Rugmuncher Posted December 30, 2008 Share Posted December 30, 2008 Hello, I'm starting to learn Java and wanted to use regex. I've figured out the basics (matching, splitting etc). What I want to know is how to extract all items in a string that are in between "", for example if I have the string ABCD afjaf "Catch1" asfiha "Catch" I want to able to retrieve Catch1 and Catch.... Anyone know how this can be done? Thanks Link to comment Share on other sites More sharing options...
0 spike232 Posted December 30, 2008 Share Posted December 30, 2008 (edited) Have a look at capture groups, the groups are define in the regex as ( ) so something like this: .*"(.*)".*"(.*)" would work for the above example, you can then retrieve the 2 matches which will be Catch1 and Catch calling .matcher(text) on your compiled pattern will return a matcher object which has groups sequentially numbered matcher.group(0), matcher.group(1) etc, Edited December 30, 2008 by spike232 Link to comment Share on other sites More sharing options...
0 Sir Rugmuncher Posted December 30, 2008 Author Share Posted December 30, 2008 Hello, thanks for pointing me in the right direction, I think I am having now with regex itself. What if I do not know the format/length of the string, and want to find all strings containted within "" quotations? shouldn't the pattern "(.*)" do that? I tried it and I keep finding only 1 match.... and if I try using .group(0) I would get an error Exception in thread "main" java.lang.IllegalStateException: No match found although .groupCount() gives an answer of 1. Link to comment Share on other sites More sharing options...
0 spike232 Posted December 31, 2008 Share Posted December 31, 2008 for something more general the .* is far too greedy, in a string like the above it would return everything from the first " to the last Try something more specific like "([a-z0-9]*)" basically you need to specify all the accepted characters except for " so it will just match each pair (also unless you tell it to be case insensitive you will need to conclude upper case too) Link to comment Share on other sites More sharing options...
0 zeroday Posted December 31, 2008 Share Posted December 31, 2008 (edited) Try something like: [^"]+"([^"]+)"[^"]+"([^"]+)" It will start from the beginning and continue until it matches a ". When it matches that, it'll put everything it matches inside a capture group, continuing until it matches another ". Then it'll do that again for the 2nd quoted text. You might need to escape the " which is part of the regex. Edit: Since I had Eclipse open.....: import java.util.regex.Matcher; import java.util.regex.Pattern; public class test { public static void main(String[] args) { String regexp = "[^\"]+\"([^\"]+)\"[^\"]+\"([^\"]+)\""; String text = "ABCD afjaf \"Catch1\" asfiha \"Catch\""; Pattern p = Pattern.compile(regexp); Matcher m = p.matcher(text); if (m.matches()) System.out.println(m.group(1) + ", " + m.group(2)); } } Prints: Catch1, Catch. Edited December 31, 2008 by zer0day Link to comment Share on other sites More sharing options...
0 Sir Rugmuncher Posted January 1, 2009 Author Share Posted January 1, 2009 Happy New Year! Thanks +zer0day, your code works great, but after looking more into regex and groups, I know understand that each (.*) is group index... Which makes what I am trying to do impractical. What I am trying to do is to extract all quotations (phrases surrounded by " " or ' ') from a paragraph. Since this number can vary from paragraph to paragraph using groups seems impractical because I do not know how many quotations there are (if any at all). At first I thought using regex seems the most practical way, but it does not seem so likeyl... Does anyone have any suggestions on how I might extract the quotations? Thanks Link to comment Share on other sites More sharing options...
0 Popcorned1 Posted January 1, 2009 Share Posted January 1, 2009 ["']([^"']*)["'] That's probably the easiest regex I could come up with. That will search a paragraph and extract all the words in "one of these" or 'even some of these'. Test Case : Lorem 'ipsum' dolor sit amet, consectetur adipiscing elit. Integer enim ipsum, pharetra hendrerit, bibendum eget, tristique ac, mi. Integer sed lacus sed pede dignissim porta. Sed at magna at nibh mollis sodales. Nullam "condimentum". Cras vel velit vel ligula volutpat condimentum. Integer ullamcorper. Nunc diam nunc, semper id, imperdiet vitae, 'dictum' id, lacus. Aenean lectus lectus, elementum at, lobortis vel, convallis eget, magna. Quisque sollicitudin eleifend quam. Praesent sodales facilisis orci. Quisque varius elementum justo. Donec non diam. Ut pharetra urna at dui. Etiam semper ornare "risus". Results : ipsum condimentum dictum risus Link to comment Share on other sites More sharing options...
Question
Sir Rugmuncher
Hello,
I'm starting to learn Java and wanted to use regex. I've figured out the basics (matching, splitting etc). What I want to know is how to extract all items in a string that are in between "", for example if I have the string
ABCD afjaf "Catch1" asfiha "Catch"
I want to able to retrieve Catch1 and Catch....
Anyone know how this can be done?
Thanks
Link to comment
Share on other sites
6 answers to this question
Recommended Posts