• 0

Determine if string can be split up into words


Question

I need to write an algorithm using dynamic programming to take an input in the form of a string without spaces or punctuation and return

a) if it can be split up into a sequence of words. "themailer" -> true; "thdfapple" -> false

b) the sequence produced. "the", "mailer"

In the dynamic programming approach, you have to find a subproblem... I've tried several with no avail... i feel like I'm closer right now with saying the subproblem is the longest word adjacent to the current position i'm testing

My general approach is to fill a table that is n x n long where n is the length of the string, the horizontal axis is the START of the substring and the vertical axis is the END of the substring (that i am testing)

Can anyone offer some insight / help on doing this? I realize this is highly esoteric and probably not explained very well... Sorry

9 answers to this question

Recommended Posts

  • 0

Hmm, that looks to be quite a challenge. You'd need some sort of dictionary to compare valid words to, but it might get a little interesting when you encounter words which could exist within other words, e.g. "ten" is a valid word, but so is "tenth". Unless you have a defined set of words and your work backwards (longest -> shortest) and break the string down that way? Never done anything like this before, so it would be interesting to see how you do it...

  • 0

If you want to do this with DYNAMIC PROGRAMMING you basically have to keep pointers of where your current words start/end and try to find a matching item. If the last word doesn't fit anywhere then try to move the previous one and so on.

  • Like 1
  • 0
  On 03/03/2010 at 20:38, Argote said:

If you want to do this with DYNAMIC PROGRAMMING you basically have to keep pointers of where your current words start/end and try to find a matching item. If the last word doesn't fit anywhere then try to move the previous one and so on.

That sounds like a good approach! Rather you than me though! Sounds very tricky indeed :rolleyes:

  • 0

The constraint is that it has to be done in O(n^2) time. Not quite sure what you mean Argote, but if your algorithm has to go back, then you might end up with n^n time because for every letter, you can have n mistakes.

I'm glad you seem to be familiar with dynamic programming though. I think I have a solution; perhaps you can verify that what I have runs in O(n^2) and is "dynamic programming"

pseudo python... the check to see if something is a word is considered to take unit time (that's just how it is.)

for start in range(0, length):
     for end in range(start+1, length):
         isWord = #check the current substring based on start and end.
         if isWord: table[start][end] = substring length

This should take n^2 time, according to a source, but it takes a little less since it only covers half of a table

due to the fact that end >= start is a constraint. This basically goes through all the substrings:

[a]bcd

[ab]cd

[abc]d

acd

a[bc]d (etc)

and when it hits one that is a word, it puts the length of that word into that cell (so to get the word, you trace the table UP that many rows)

I then propose to do a DEPTH FIRST SEARCH from the end of this table back, (as if it were a graph).

If I can get to the start state, then I can split the string up (and will have actually built it by then).

Depth First Search is O(|V| + |E|)... |V| = n, and |E| seems to be constrained to be n(n-1)=n^2-n in this case.

You have n nodes, and each node can have n-1 edges since it's basically an undirected graph.

so I have O(n^2 + n^2 -n) = O(n^2) runtime, I THINK....

please feel free to correct me or comment on it. I'm not sure if this is actually dynamic programming since I have to do a multipass (the DFS after I fill in the table)

This is my test string, by the way.... I hammered it out by combining interesting cases... it covers things such as choices like "than" or "thank" or "an"... "mail" or "mailer" or "email"

  thankillthemailer
t 00000000000000000
h 00000000000000000
a 00000000000000000
n 40000000000000000
k 50000000000000000
i 00000000000000000
l 00000000000000000
l 00004300000000000
t 00000000000000000
h 00000000000000000
e 00000000320000000
m 00000000430000000
a 00000000000000000
i 00000000000000000
l 00000000005400000
e 00000000000000000
r 00000000000600000

so the columns represent the START and the rows represent the END from my algorithm, so if you look at table['k','l'], you get a 4, which means that the 'l' is the end of a 4 letter word. (yes, i realize there are two l's, but it's easier to explain that way. i'm obviously talking about the second l)

  • 0

There is a much less complex and more elegant solution.

You propose to finish with a DFS working backwards to locate all solutions (paths through the directed graph back to the start). This itself is very close to the more elegant solution I will attempt to describe. We are indeed going to be starting at the end of the string rather than the beginning.

Let us call a string that has been successfully split into words a "sentence". Consider that every sentence must end with a word. Working from the end of the string forward, we will construct all valid sentences recursively.

Rather than iterating through all end and then all start positions, let us consider our end position static - we simply always end with the end of our "current" string as we are not interested in words ending before the end of our current string because those solutions don't provide a valid word at the end of the sentence. We will thus iterate only the start position, working backwards from end-1 to 0. If we ever get to start = 0 without finding a valid word, then there aren't any valid sentences because there are no valid words that the sentence can end with. We don't need to iterate again with a new end position, that would be wasted computation.

As we work backwards by start position, each time we encounter a valid word we will recurse with the substring preceding the word, and then prepend each result from the recursion to our "current" valid word to generate an array of our valid solution sentences. We don't recurse if start is 0 as we have reached the beginning of the string and can simply add the valid word to our array of solutions and return.

In pseudo-code (I have a running solution in PHP that I can generalize and post if there is interest, though it is much better left as an exercise for the reader)

 function find_sentences(string) {
     sentences = [];
     end = length(string) - 1;
     for (start = end; start >= 0; start--) {
         sublength = end - start + 1;
         substring = substring(string, start, sublength);
         if (is_word(substring)) {
             if (start > 0) {
                 next = substring(string, 0, start);
                 recurse = find_sentences(next);
                 foreach (recurse as result) {
                     sentences[] = result + ' ' + substring;
                 }
             } else {
                 sentences[] = substring;
             }
         }
     }
     return sentences;
 }

Enjoy! Any questions, I'll be happy to answer as best I can.

Also, thanks to ArmedMonkey - when I originally began on a solution for a similar problem, it was your solution that 'tipped' me off to the solution I described above.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Apple Intelligence can now analyze your iPhone's screen, offers live translation by Aditya Tiwari Alongside iOS 26, iPadOS 26, macOS 26, watchOS 26, tvOS 26, and new AirPods features, Apple announced some stuff for Apple Intelligence at WWDC 2025. A highlight of the latest announcements is that developers can now access Apple's on-device foundation models, which power Apple Intelligence, to enhance their apps. Apple's Foundation Models framework enables developers to create AI-powered features and experiences that can also operate offline. "The framework has native support for Swift, so app developers can easily access the Apple Intelligence model with as few as three lines of code," Apple said, adding that the access is offered free of charge. Apple has introduced several new features for the general public. For starters, the new Live Translation feature works across Messages, FaceTime, and phone calls, using "Apple-built models that run entirely on device" to translate messages from one language to another. In other words, the feature can translate written text in Messages into the recipient's preferred language and display translated live captions while still hearing the speaker’s voice in FaceTime. Meanwhile, the translated text is spoken aloud on regular phone calls throughout the conversation. Visual Intelligence can now access a user's iPhone screen to answer questions and take action on the content being viewed on the screen across apps. Users can ask ChatGPT for details on specific objects and what they're looking at on their screen to learn more. They can also search Google, Etsy, and other supported apps to find similar images and products. Visual Intelligence can also identify events displayed on the screen and suggest adding them to the user's calendar. The updated Genmoji feature lets users mix emojis and combine them with text descriptions to generate something new. They can change expressions and adjust personal attributes, such as hairstyle, to match the latest look when making Genmojis inspired by their friends and family members. Image Playground has been updated to support new styles generated by ChatGPT, such as oil paintings or vector art. When users have a specific idea in mind, they can use the "Any Style" option. Apple Intelligence also powers the Shortcuts app, using on-device processing or Private Cloud Compute to generate responses that feed into the rest of the shortcut while maintaining privacy. For instance, Apple explained that "a student can build a shortcut that uses the Apple Intelligence model to compare an audio transcription of a class lecture to the notes they took, and add any key points they may have missed." Apple is integrating its AI features into more apps with subsequent OS updates. The Wallet app can now summarize order tracking details from emails sent by merchants and delivery carriers. It lets users check their full order details, progress notifications, and other details. In Messages, Apple Intelligence can suggest where a poll might come in handy, and users can create backgrounds to fit their chats using Image Playground. The new Apple Intelligence features are available for testing on iPhone, iPad, Mac, Apple Watch, and Apple Vision Pro through the Apple Developer Program. By the end of the year, Apple's AI suite will support more languages, including Danish, Dutch, Norwegian, Portuguese (Portugal), Swedish, Turkish, Chinese (traditional), and Vietnamese. You can read about Apple Intelligence in the official announcement post on Apple Newsroom.
    • Absolutely. Glass widgets, tabs, docks, Rainmeter, MicaForEveryone (gorgeous full glass explorer windows), etc. some of us never left the glass ecosphere. I actually hope this encourages MS to put those Acrylic/Mica/Glass hooks back into the OS as an option. It doesn't have to be the default, but some of us would love the choice.
    • I had deleted the folder after installing the update and then re-created the inetpub folder. Ran the script today even though the folder exists, it does some minor changes to the permissions and to the order of some.
    • Creating 1,250 jobs...in order to destroy two orders of magnitude more.
  • Recent Achievements

    • Rookie
      CHUNWEI went up a rank
      Rookie
    • Enthusiast
      the420kid went up a rank
      Enthusiast
    • Conversation Starter
      NeoToad777 earned a badge
      Conversation Starter
    • Week One Done
      VicByrd earned a badge
      Week One Done
    • Reacting Well
      NeoToad777 earned a badge
      Reacting Well
  • Popular Contributors

    1. 1
      +primortal
      471
    2. 2
      +FloatingFatMan
      268
    3. 3
      ATLien_0
      256
    4. 4
      Edouard
      200
    5. 5
      snowy owl
      181
  • Tell a friend

    Love Neowin? Tell a friend!