• 0

Determine if string can be split up into words


Question

I need to write an algorithm using dynamic programming to take an input in the form of a string without spaces or punctuation and return

a) if it can be split up into a sequence of words. "themailer" -> true; "thdfapple" -> false

b) the sequence produced. "the", "mailer"

In the dynamic programming approach, you have to find a subproblem... I've tried several with no avail... i feel like I'm closer right now with saying the subproblem is the longest word adjacent to the current position i'm testing

My general approach is to fill a table that is n x n long where n is the length of the string, the horizontal axis is the START of the substring and the vertical axis is the END of the substring (that i am testing)

Can anyone offer some insight / help on doing this? I realize this is highly esoteric and probably not explained very well... Sorry

9 answers to this question

Recommended Posts

  • 0

Hmm, that looks to be quite a challenge. You'd need some sort of dictionary to compare valid words to, but it might get a little interesting when you encounter words which could exist within other words, e.g. "ten" is a valid word, but so is "tenth". Unless you have a defined set of words and your work backwards (longest -> shortest) and break the string down that way? Never done anything like this before, so it would be interesting to see how you do it...

  • 0

If you want to do this with DYNAMIC PROGRAMMING you basically have to keep pointers of where your current words start/end and try to find a matching item. If the last word doesn't fit anywhere then try to move the previous one and so on.

  • Like 1
  • 0
  On 03/03/2010 at 20:38, Argote said:

If you want to do this with DYNAMIC PROGRAMMING you basically have to keep pointers of where your current words start/end and try to find a matching item. If the last word doesn't fit anywhere then try to move the previous one and so on.

That sounds like a good approach! Rather you than me though! Sounds very tricky indeed :rolleyes:

  • 0

The constraint is that it has to be done in O(n^2) time. Not quite sure what you mean Argote, but if your algorithm has to go back, then you might end up with n^n time because for every letter, you can have n mistakes.

I'm glad you seem to be familiar with dynamic programming though. I think I have a solution; perhaps you can verify that what I have runs in O(n^2) and is "dynamic programming"

pseudo python... the check to see if something is a word is considered to take unit time (that's just how it is.)

for start in range(0, length):
     for end in range(start+1, length):
         isWord = #check the current substring based on start and end.
         if isWord: table[start][end] = substring length

This should take n^2 time, according to a source, but it takes a little less since it only covers half of a table

due to the fact that end >= start is a constraint. This basically goes through all the substrings:

[a]bcd

[ab]cd

[abc]d

acd

a[bc]d (etc)

and when it hits one that is a word, it puts the length of that word into that cell (so to get the word, you trace the table UP that many rows)

I then propose to do a DEPTH FIRST SEARCH from the end of this table back, (as if it were a graph).

If I can get to the start state, then I can split the string up (and will have actually built it by then).

Depth First Search is O(|V| + |E|)... |V| = n, and |E| seems to be constrained to be n(n-1)=n^2-n in this case.

You have n nodes, and each node can have n-1 edges since it's basically an undirected graph.

so I have O(n^2 + n^2 -n) = O(n^2) runtime, I THINK....

please feel free to correct me or comment on it. I'm not sure if this is actually dynamic programming since I have to do a multipass (the DFS after I fill in the table)

This is my test string, by the way.... I hammered it out by combining interesting cases... it covers things such as choices like "than" or "thank" or "an"... "mail" or "mailer" or "email"

  thankillthemailer
t 00000000000000000
h 00000000000000000
a 00000000000000000
n 40000000000000000
k 50000000000000000
i 00000000000000000
l 00000000000000000
l 00004300000000000
t 00000000000000000
h 00000000000000000
e 00000000320000000
m 00000000430000000
a 00000000000000000
i 00000000000000000
l 00000000005400000
e 00000000000000000
r 00000000000600000

so the columns represent the START and the rows represent the END from my algorithm, so if you look at table['k','l'], you get a 4, which means that the 'l' is the end of a 4 letter word. (yes, i realize there are two l's, but it's easier to explain that way. i'm obviously talking about the second l)

  • 0

There is a much less complex and more elegant solution.

You propose to finish with a DFS working backwards to locate all solutions (paths through the directed graph back to the start). This itself is very close to the more elegant solution I will attempt to describe. We are indeed going to be starting at the end of the string rather than the beginning.

Let us call a string that has been successfully split into words a "sentence". Consider that every sentence must end with a word. Working from the end of the string forward, we will construct all valid sentences recursively.

Rather than iterating through all end and then all start positions, let us consider our end position static - we simply always end with the end of our "current" string as we are not interested in words ending before the end of our current string because those solutions don't provide a valid word at the end of the sentence. We will thus iterate only the start position, working backwards from end-1 to 0. If we ever get to start = 0 without finding a valid word, then there aren't any valid sentences because there are no valid words that the sentence can end with. We don't need to iterate again with a new end position, that would be wasted computation.

As we work backwards by start position, each time we encounter a valid word we will recurse with the substring preceding the word, and then prepend each result from the recursion to our "current" valid word to generate an array of our valid solution sentences. We don't recurse if start is 0 as we have reached the beginning of the string and can simply add the valid word to our array of solutions and return.

In pseudo-code (I have a running solution in PHP that I can generalize and post if there is interest, though it is much better left as an exercise for the reader)

 function find_sentences(string) {
     sentences = [];
     end = length(string) - 1;
     for (start = end; start >= 0; start--) {
         sublength = end - start + 1;
         substring = substring(string, start, sublength);
         if (is_word(substring)) {
             if (start > 0) {
                 next = substring(string, 0, start);
                 recurse = find_sentences(next);
                 foreach (recurse as result) {
                     sentences[] = result + ' ' + substring;
                 }
             } else {
                 sentences[] = substring;
             }
         }
     }
     return sentences;
 }

Enjoy! Any questions, I'll be happy to answer as best I can.

Also, thanks to ArmedMonkey - when I originally began on a solution for a similar problem, it was your solution that 'tipped' me off to the solution I described above.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Trying to make me feel bad for taking issue with your website pushing the replacement of real authors as an opportunity to make "passive income" is a ridiculous waste of time and energy on your part. I am not criticizing you for having sponsored content on your website so you can keep the lights on. Don't project that onto me like I'm some sort of selfish monster just overreacting. The ###### software that your site is peddling will one day replace you and every other writer if you continue excusing it. Maybe you ought to get another partner that respects real human creators. Just a thought. And furthermore, your "but everybody is doing it" excuse is pathetic. There are a million other products in the world you could advertise, you never HAD to.
    • Apple CarPlay to let you stream iPhone videos to your car screen by Sagar Naresh Bhavsar In an official post for developers, Apple announced that CarPlay users will be able to watch iPhone videos on their car's display. This feature will be powered by Apple AirPlay, allowing users to stream videos wirelessly from their iPhones to their car's infotainment screen. For safety reasons, users will be able to enjoy videos on their car's display, but only when their vehicle is parked. Users won't be able to watch videos while the car is on the move. The connected iPhone will automatically detect when the car moves and will stop playing the video to avoid driver distraction. According to Apple, devs need to "integrate support for CarPlay with AirPlay video to enable this feature in your car." Carmakers will need to add support for CarPlay with AirPlay video to let users watch videos in their cars. So, this may take a while to roll out. Apple hasn't specifically mentioned whether it will be exclusive to the newly introduced and limited CarPlay Ultra, or standard CarPlay users will also be able to enjoy videos through AirPlay on their car's screen. Also, there is no clarity if existing models will get support via software update, or if it will be limited to newer models only. On the Android side, Google has also announced a bunch of new capabilities soon to be coming to Android Auto. The company has extended support for weather apps in beta for Android Auto and has confirmed that Android Auto will soon support video and browser apps as well. Image via Depositphotos
    • For a moment there I thought it speeds up Windows installation, but alas...
    • PNY's DUO LINK V3 promises truly incredible speed for a USB flash drive by Sayan Sen PNY today announced a new USB flash drive that blends modern design with high performance. The new DUO LINK™ V3 supports USB 3.2 Gen 2 and offers both Type-C and Type-A connectors. Housed in a metal shell with a matte black finish, the device is built for users who need reliable results when transferring and storing files. The drive is available in multiple storage sizes ranging from 256GB, all the way up to 2TB. According to PNY, the DUO LINK V3 is engineered to deliver read speeds of up to 1,000 MB/s and write speeds of up to 800 MB/s. These speeds are significantly higher than those achieved by standard USB 2.0 flash drives, allowing users to move large files. The product reminds us of the "Poxiao" flash memory we recently covered, which promises even more incredible speeds. The DUO LINK V3 flash drive’s design uses the common swivel mechanism with dual connectors. One side of the device features a USB Type-C connector, while the other side has a USB Type-A connector. This dual-connector system is intended to maximize compatibility, making the DUO LINK V3 useful across a broad range of devices, including smartphones, tablets, laptops, and desktop computers. Thus, in practical terms, users can easily manage files between newer mobile devices and traditional computers without needing extra adapters. Durability and ease of use are emphasized by the metal housing, which is designed to protect the drive during everyday handling. A built-in key loop further supports portability, an important factor for professionals who need to carry important data with them. The device also maintains backward compatibility with older USB standards such as USB 3.2 Gen 1, USB 3.0, and USB 2.0, ensuring that users will find it adaptable to many existing systems. The DUO LINK V3 flash drive is now available directly from PNY and through selected online marketplaces like Amazon. The pricing starts at $34.99 for the 256GB version and reaches $159.99 for the 2TB version. This release provides users with a tool that meets the growing demands of data-intensive work while remaining accessible to those needing a reliable and versatile storage solution. This article was generated with some help from AI and reviewed by an editor. As an Amazon Associate we earn from qualifying purchases.
    • Message from Campbell Wilson, MD & CEO, Air India: https://www.instagram.com/airindia/reel/DKzZQUPhafx/
  • Recent Achievements

    • Week One Done
      fashionuae earned a badge
      Week One Done
    • One Month Later
      fashionuae earned a badge
      One Month Later
    • Week One Done
      elsafaacompany earned a badge
      Week One Done
    • Week One Done
      Yianis earned a badge
      Week One Done
    • Veteran
      Travesty went up a rank
      Veteran
  • Popular Contributors

    1. 1
      +primortal
      513
    2. 2
      ATLien_0
      265
    3. 3
      +FloatingFatMan
      195
    4. 4
      +Edouard
      174
    5. 5
      snowy owl
      125
  • Tell a friend

    Love Neowin? Tell a friend!