• 0

Determine if string can be split up into words


Question

I need to write an algorithm using dynamic programming to take an input in the form of a string without spaces or punctuation and return

a) if it can be split up into a sequence of words. "themailer" -> true; "thdfapple" -> false

b) the sequence produced. "the", "mailer"

In the dynamic programming approach, you have to find a subproblem... I've tried several with no avail... i feel like I'm closer right now with saying the subproblem is the longest word adjacent to the current position i'm testing

My general approach is to fill a table that is n x n long where n is the length of the string, the horizontal axis is the START of the substring and the vertical axis is the END of the substring (that i am testing)

Can anyone offer some insight / help on doing this? I realize this is highly esoteric and probably not explained very well... Sorry

9 answers to this question

Recommended Posts

  • 0

Hmm, that looks to be quite a challenge. You'd need some sort of dictionary to compare valid words to, but it might get a little interesting when you encounter words which could exist within other words, e.g. "ten" is a valid word, but so is "tenth". Unless you have a defined set of words and your work backwards (longest -> shortest) and break the string down that way? Never done anything like this before, so it would be interesting to see how you do it...

  • 0

If you want to do this with DYNAMIC PROGRAMMING you basically have to keep pointers of where your current words start/end and try to find a matching item. If the last word doesn't fit anywhere then try to move the previous one and so on.

  • Like 1
  • 0
  On 03/03/2010 at 20:38, Argote said:

If you want to do this with DYNAMIC PROGRAMMING you basically have to keep pointers of where your current words start/end and try to find a matching item. If the last word doesn't fit anywhere then try to move the previous one and so on.

That sounds like a good approach! Rather you than me though! Sounds very tricky indeed :rolleyes:

  • 0

The constraint is that it has to be done in O(n^2) time. Not quite sure what you mean Argote, but if your algorithm has to go back, then you might end up with n^n time because for every letter, you can have n mistakes.

I'm glad you seem to be familiar with dynamic programming though. I think I have a solution; perhaps you can verify that what I have runs in O(n^2) and is "dynamic programming"

pseudo python... the check to see if something is a word is considered to take unit time (that's just how it is.)

for start in range(0, length):
     for end in range(start+1, length):
         isWord = #check the current substring based on start and end.
         if isWord: table[start][end] = substring length

This should take n^2 time, according to a source, but it takes a little less since it only covers half of a table

due to the fact that end >= start is a constraint. This basically goes through all the substrings:

[a]bcd

[ab]cd

[abc]d

acd

a[bc]d (etc)

and when it hits one that is a word, it puts the length of that word into that cell (so to get the word, you trace the table UP that many rows)

I then propose to do a DEPTH FIRST SEARCH from the end of this table back, (as if it were a graph).

If I can get to the start state, then I can split the string up (and will have actually built it by then).

Depth First Search is O(|V| + |E|)... |V| = n, and |E| seems to be constrained to be n(n-1)=n^2-n in this case.

You have n nodes, and each node can have n-1 edges since it's basically an undirected graph.

so I have O(n^2 + n^2 -n) = O(n^2) runtime, I THINK....

please feel free to correct me or comment on it. I'm not sure if this is actually dynamic programming since I have to do a multipass (the DFS after I fill in the table)

This is my test string, by the way.... I hammered it out by combining interesting cases... it covers things such as choices like "than" or "thank" or "an"... "mail" or "mailer" or "email"

  thankillthemailer
t 00000000000000000
h 00000000000000000
a 00000000000000000
n 40000000000000000
k 50000000000000000
i 00000000000000000
l 00000000000000000
l 00004300000000000
t 00000000000000000
h 00000000000000000
e 00000000320000000
m 00000000430000000
a 00000000000000000
i 00000000000000000
l 00000000005400000
e 00000000000000000
r 00000000000600000

so the columns represent the START and the rows represent the END from my algorithm, so if you look at table['k','l'], you get a 4, which means that the 'l' is the end of a 4 letter word. (yes, i realize there are two l's, but it's easier to explain that way. i'm obviously talking about the second l)

  • 0

There is a much less complex and more elegant solution.

You propose to finish with a DFS working backwards to locate all solutions (paths through the directed graph back to the start). This itself is very close to the more elegant solution I will attempt to describe. We are indeed going to be starting at the end of the string rather than the beginning.

Let us call a string that has been successfully split into words a "sentence". Consider that every sentence must end with a word. Working from the end of the string forward, we will construct all valid sentences recursively.

Rather than iterating through all end and then all start positions, let us consider our end position static - we simply always end with the end of our "current" string as we are not interested in words ending before the end of our current string because those solutions don't provide a valid word at the end of the sentence. We will thus iterate only the start position, working backwards from end-1 to 0. If we ever get to start = 0 without finding a valid word, then there aren't any valid sentences because there are no valid words that the sentence can end with. We don't need to iterate again with a new end position, that would be wasted computation.

As we work backwards by start position, each time we encounter a valid word we will recurse with the substring preceding the word, and then prepend each result from the recursion to our "current" valid word to generate an array of our valid solution sentences. We don't recurse if start is 0 as we have reached the beginning of the string and can simply add the valid word to our array of solutions and return.

In pseudo-code (I have a running solution in PHP that I can generalize and post if there is interest, though it is much better left as an exercise for the reader)

 function find_sentences(string) {
     sentences = [];
     end = length(string) - 1;
     for (start = end; start >= 0; start--) {
         sublength = end - start + 1;
         substring = substring(string, start, sublength);
         if (is_word(substring)) {
             if (start > 0) {
                 next = substring(string, 0, start);
                 recurse = find_sentences(next);
                 foreach (recurse as result) {
                     sentences[] = result + ' ' + substring;
                 }
             } else {
                 sentences[] = substring;
             }
         }
     }
     return sentences;
 }

Enjoy! Any questions, I'll be happy to answer as best I can.

Also, thanks to ArmedMonkey - when I originally began on a solution for a similar problem, it was your solution that 'tipped' me off to the solution I described above.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Do you have a 365 account? I should have been more clear, I mean free accounts.
    • What?! "May 31 2024 knowledge cutoff"?
    • Amazon Alexa+ now has more than a million users by Aditya Tiwari Amazon's muscled-up voice assistant, Alexa+, has reached a new milestone. A company spokesperson told The Verge that Alexa+ has now crossed one million users. The e-commerce giant introduced Alexa+ earlier this year as its generative AI offering. Why? It's a new trend, and everyone is doing it. According to the company, Alexa Plus offers more natural and free-flowing conversations than its predecessor. You can speak half-formed thoughts using colloquial expressions, and the AI assistant should be able to understand you and provide an answer. Announcing its capabilities, Amazon previously said that you will be able to start a conversation on your Echo device and continue it on your phone, car, or computer. One million may not be a significant number when comparing it with the number of Alexa-enabled devices out there. Amazon revealed earlier this year that there are over 600 million Alexa devices globally. However, the number of Alexa+ users has increased from 'hundreds of thousands' in the previous month. The user base is not as big as that of other names like Gemini and ChatGPT because Amazon is still offering the generative AI assistant through an Early Access program, available to Prime and non-Prime members who own a compatible Echo device. We can find social media posts from different users who have been invited to try Alexa+. While there have been positive reviews from some, the road isn't buttery smooth for others. One user claimed that the early access Alexa+ has problems accessing some temperature sensors the previous version of Alexa would. "I also really dislike how it confidently will tell me something that is incorrect now instead of just saying it doesn't know like it used to tell me," the user added. The upgraded AI voice assistant will cost $19.99 per month, but is being offered for free to Prime subscribers. Alexa+ started rolling out in the US as part of its early access program. One reason why Amazon is giving Alexa+ a slow rollout is that the new devices and services chief, Panos Panay, wants to eliminate all the problems related to the generative AI assistant. Amazon's spokesperson told the publication that the early access program doesn't include features like brainstorming gift ideas, scheduling your next spa visit, ordering groceries hands-free, and jumping to your favorite scene on Fire TV. The program also doesn't offer the "new browser-based experience at Alexa.com," which would put Amazon's AI assistant in line with ChatGPT and Gemini. These missing features will be added in the coming weeks and months, as per the spokesperson, adding that almost 90% of the features are now a part of early access.
    • MSI's 32-inch 4K QD-OLED gaming monitor gets a big price cut for UK gamers and professionals by Paul Hill If you’re a gamer in the UK and looking for a monitor to upgrade to then check out the MSI MPG 321URX QD-OLED 31.5 Inch 4K UHD Gaming Monitor which you can now pick up for just 75% of its recommended retail price. The RRP of this monitor is £1,199, but thanks to this deal, you can get it for just £898.99 for a limited time (purchase link down below). With its 4K display, 240Hz refresh rate, and 0.03ms GTG, you’ll have the edge over other gamers by avoiding lag. At 31.5-inches, it’s the ideal monitor size if you’re sitting up close to it at a desk, you don’t want it too big at such a short range, but you also want to be able to see all the image details so 31.5-inches is a good balance. What makes QD-OLED stand out? There are loads of terms used to describe displays such as AMOLED, OLED, LED, and it can all get a bit confusing. This monitor adds yet another acronym called QD-OLED, which stands for Quantum Dot OLED. For you as a buyer, this means your new monitor has self-emitting pixels that deliver great black levels. It also features an enhanced sub-pixel arrangement for extra sharpness. The 31.5-inch 4K UHD monitor has a 3,840 x 2,160 pixel resolution making it ideal for playing games, but also watching movies in the best quality. Other important features worth mentioning are the 1.07 billion colors (10-bit) that the monitor can produce, its 99% DCI-P3 support, and DisplayHDR True Black 400 certification. All of these things make the monitor produce more accurate colours, potentially making it a good choice for professionals editing videos and photos too. Obviously, games will look good too. MSI has also packed in a fanless graphene heatsink which should help to increase the durability of the monitor long-term. This could extend the time until you need to buy a new monitor, further justifying its almost £900 price tag. Gaming and productivity features It’s not just the hardware that makes this monitor excel for gaming, it also comes with great software enhancements and connectivity options. On the software side, you get the following features: Smart Crosshair: Projects a customizable crosshair onto the screen to improve hip-fire accuracy and iron sights in first-person shooter games. Optix Scope: Gives you a built-in aim magnifier with multi-stage zooming and shortcut keys to quickly switch magnification levels. AI Vision: This automatically enhances brightness and colour saturation, particularly in dark areas of the screen, making it easier to see enemies hiding in shadows or dark corners. If you have two separate systems you want to connect to the monitor at once, you can do so with this monitor thanks to KVM support. You can view both sources with Picture-in-Picture and Picture-by-Picture modes. The MSI MPG 321URX QD-OLED 31.5 Inch 4K UHD Gaming Monitor also supports next-gen consoles with features like HDMI CEC Profile Sync, HDMI Variable Refresh Rate (VRR), and 4K:4K downscaling. In terms of connectivity and ergonomics, you get DisplayPort 1.4a, 2x HDMI 2.1 (CEC), USB Type-C with 90W power delivery, and a USB hub. The monitor uses a tilt-, swivel- & height-adjustable stand that is VESA compatible. Should you buy this monitor? The MSI MPG 321URX QD-OLED 31.5 Inch 4K UHD Gaming Monitor is definitely a product for serious gamers looking for top-tier visual fidelity and performance or content creators who need accurate colours and high resolution. Even with the significant discount, it’s still at a premium price and definitely not for everyone. If you are in one of the groups mentioned, then you should give serious consideration to buying the MSI MPG 321URX QD-OLED 31.5 Inch 4K UHD Gaming Monitor as it's the lowest price the monitor has been at on Amazon to date. MSI MPG 321URX QD-OLED 31.5 Inch 4K Gaming Monitor: £898.99 (Amazon UK) / RRP £1,199 This Amazon deal is U.K. specific, and not available in other regions unless specified. If you don't like it or want to look at more options, check out the Amazon UK deals page here. Get Prime, Prime Video, Music Unlimited, Audible or Kindle Unlimited, free for the first 30 days As an Amazon Associate we earn from qualifying purchases.
  • Recent Achievements

    • Enthusiast
      computerdave91111 went up a rank
      Enthusiast
    • Week One Done
      Falisha Manpower earned a badge
      Week One Done
    • One Month Later
      elsa777 earned a badge
      One Month Later
    • Week One Done
      elsa777 earned a badge
      Week One Done
    • First Post
      K Dorman earned a badge
      First Post
  • Popular Contributors

    1. 1
      +primortal
      533
    2. 2
      ATLien_0
      273
    3. 3
      +FloatingFatMan
      201
    4. 4
      +Edouard
      200
    5. 5
      snowy owl
      138
  • Tell a friend

    Love Neowin? Tell a friend!