• 0

Determine if string can be split up into words


Question

I need to write an algorithm using dynamic programming to take an input in the form of a string without spaces or punctuation and return

a) if it can be split up into a sequence of words. "themailer" -> true; "thdfapple" -> false

b) the sequence produced. "the", "mailer"

In the dynamic programming approach, you have to find a subproblem... I've tried several with no avail... i feel like I'm closer right now with saying the subproblem is the longest word adjacent to the current position i'm testing

My general approach is to fill a table that is n x n long where n is the length of the string, the horizontal axis is the START of the substring and the vertical axis is the END of the substring (that i am testing)

Can anyone offer some insight / help on doing this? I realize this is highly esoteric and probably not explained very well... Sorry

9 answers to this question

Recommended Posts

  • 0

Hmm, that looks to be quite a challenge. You'd need some sort of dictionary to compare valid words to, but it might get a little interesting when you encounter words which could exist within other words, e.g. "ten" is a valid word, but so is "tenth". Unless you have a defined set of words and your work backwards (longest -> shortest) and break the string down that way? Never done anything like this before, so it would be interesting to see how you do it...

  • 0

If you want to do this with DYNAMIC PROGRAMMING you basically have to keep pointers of where your current words start/end and try to find a matching item. If the last word doesn't fit anywhere then try to move the previous one and so on.

  • Like 1
  • 0
  On 03/03/2010 at 20:38, Argote said:

If you want to do this with DYNAMIC PROGRAMMING you basically have to keep pointers of where your current words start/end and try to find a matching item. If the last word doesn't fit anywhere then try to move the previous one and so on.

That sounds like a good approach! Rather you than me though! Sounds very tricky indeed :rolleyes:

  • 0

The constraint is that it has to be done in O(n^2) time. Not quite sure what you mean Argote, but if your algorithm has to go back, then you might end up with n^n time because for every letter, you can have n mistakes.

I'm glad you seem to be familiar with dynamic programming though. I think I have a solution; perhaps you can verify that what I have runs in O(n^2) and is "dynamic programming"

pseudo python... the check to see if something is a word is considered to take unit time (that's just how it is.)

for start in range(0, length):
     for end in range(start+1, length):
         isWord = #check the current substring based on start and end.
         if isWord: table[start][end] = substring length

This should take n^2 time, according to a source, but it takes a little less since it only covers half of a table

due to the fact that end >= start is a constraint. This basically goes through all the substrings:

[a]bcd

[ab]cd

[abc]d

acd

a[bc]d (etc)

and when it hits one that is a word, it puts the length of that word into that cell (so to get the word, you trace the table UP that many rows)

I then propose to do a DEPTH FIRST SEARCH from the end of this table back, (as if it were a graph).

If I can get to the start state, then I can split the string up (and will have actually built it by then).

Depth First Search is O(|V| + |E|)... |V| = n, and |E| seems to be constrained to be n(n-1)=n^2-n in this case.

You have n nodes, and each node can have n-1 edges since it's basically an undirected graph.

so I have O(n^2 + n^2 -n) = O(n^2) runtime, I THINK....

please feel free to correct me or comment on it. I'm not sure if this is actually dynamic programming since I have to do a multipass (the DFS after I fill in the table)

This is my test string, by the way.... I hammered it out by combining interesting cases... it covers things such as choices like "than" or "thank" or "an"... "mail" or "mailer" or "email"

  thankillthemailer
t 00000000000000000
h 00000000000000000
a 00000000000000000
n 40000000000000000
k 50000000000000000
i 00000000000000000
l 00000000000000000
l 00004300000000000
t 00000000000000000
h 00000000000000000
e 00000000320000000
m 00000000430000000
a 00000000000000000
i 00000000000000000
l 00000000005400000
e 00000000000000000
r 00000000000600000

so the columns represent the START and the rows represent the END from my algorithm, so if you look at table['k','l'], you get a 4, which means that the 'l' is the end of a 4 letter word. (yes, i realize there are two l's, but it's easier to explain that way. i'm obviously talking about the second l)

  • 0

There is a much less complex and more elegant solution.

You propose to finish with a DFS working backwards to locate all solutions (paths through the directed graph back to the start). This itself is very close to the more elegant solution I will attempt to describe. We are indeed going to be starting at the end of the string rather than the beginning.

Let us call a string that has been successfully split into words a "sentence". Consider that every sentence must end with a word. Working from the end of the string forward, we will construct all valid sentences recursively.

Rather than iterating through all end and then all start positions, let us consider our end position static - we simply always end with the end of our "current" string as we are not interested in words ending before the end of our current string because those solutions don't provide a valid word at the end of the sentence. We will thus iterate only the start position, working backwards from end-1 to 0. If we ever get to start = 0 without finding a valid word, then there aren't any valid sentences because there are no valid words that the sentence can end with. We don't need to iterate again with a new end position, that would be wasted computation.

As we work backwards by start position, each time we encounter a valid word we will recurse with the substring preceding the word, and then prepend each result from the recursion to our "current" valid word to generate an array of our valid solution sentences. We don't recurse if start is 0 as we have reached the beginning of the string and can simply add the valid word to our array of solutions and return.

In pseudo-code (I have a running solution in PHP that I can generalize and post if there is interest, though it is much better left as an exercise for the reader)

 function find_sentences(string) {
     sentences = [];
     end = length(string) - 1;
     for (start = end; start >= 0; start--) {
         sublength = end - start + 1;
         substring = substring(string, start, sublength);
         if (is_word(substring)) {
             if (start > 0) {
                 next = substring(string, 0, start);
                 recurse = find_sentences(next);
                 foreach (recurse as result) {
                     sentences[] = result + ' ' + substring;
                 }
             } else {
                 sentences[] = substring;
             }
         }
     }
     return sentences;
 }

Enjoy! Any questions, I'll be happy to answer as best I can.

Also, thanks to ArmedMonkey - when I originally began on a solution for a similar problem, it was your solution that 'tipped' me off to the solution I described above.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • HWiNFO 8.28 by Razvan Serea HWiNFO (Hardware Information) is a professional hardware information and diagnostic tool supporting latest components, industry technologies and standards. It's targeted to recognize and extract the most possible amount of information about computer's hardware which makes it suitable for users searching for driver updates, computer manufactures, system integrators and technical exteperts too. Retrieved information is presented in a logical and easily understandable form and can be exported into various types of reports. System health monitoring and basic benchmarking available too. HWiNFO32 & HWiNFO64 v8.28 changelog: Extended number of temperatures monitored (for CPUs with up-to 256 cores). Added OSD independent window without title bar. Added workaround for thermal throttling stuck sticky on Arrow Lake-H. Removed taskbar entry for OSD window. Improved support of next-generation AMD server and workstation platforms. Improved I3C bus synchronization on Intel Sapphire Rapids and later CPUs. Fixed sensor monitoring on some ASRock B850 series mainboards. Added support of ITE IT8698E. Enhanced sensor monitoring on GIGABYTE Q870M D3H. Enhanced support of Intel Wildcat Lake. Added monitoring of NVIDIA PCI Express Error Counters. Added AMD Radeon AI Pro R9700. Improved support of Intel Granite Rapids. Download: HWiNFO 8.28 | 17.8 MB (Free for Non-Commercial use) Download: HWiNFO Portable View: HWiNFO Website | HWiNFO Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
    • Helpwire doesn't support sound transfer.
    • Here's your first look at Raycast for Windows, now in beta by David Uzondu Last September, we reported that Raycast, the popular macOS command bar and launcher tool, was set to make its way to Windows in 2025. Since its launch in 2020, Raycast has quickly become the go-to app that people recommend for new Mac users due to its speed, UI, features, and plethora of extensions. People have been begging for a Windows build, and even a Linux one, for a long time. Now, the company has published a YouTube video showcasing the beta on a Microsoft Surface and explaining how you can get your hands on it today. It's all opened by pressing the default hotkey (Alt + Space). This gives you a central command bar to launch apps, run commands, and search for files. You get a lot of the core features right now, like a calculator that handles natural language queries, a full clipboard history manager, and tools for creating text snippets and quicklinks. For example, typing .pdf instantly filters for just PDF documents, and using a forward slash lets you navigate through folder paths directly. Once you find a file, pressing Ctrl + K opens a contextual action menu with options like "Show in File Explorer" or "Copy File." The included calculator is also a powerhouse, handling everything from basic math to natural language queries, like "time in Barcelona" or "days until August 12." The full clipboard history manager is a massive upgrade over the Windows default. It keeps a running log of everything you copy to your clipboard, including blocks of text, links, images, and files. You can open the clipboard history and search through every item you've copied. It also comes with a powerful filtering system. By pressing Ctrl + P, you can filter your history to show only text, images, links, or even just colors. Its Quick AI feature is also included and will be free for everyone during the beta period. That means you can ask it questions and get answers without needing a Pro subscription for now. Quick AI is built on a set of AI models, like GPT-4o mini, which maintains conversational context, allowing for natural follow-up questions, and you can browse your entire chat history. There's also an "AI search" feature, which is listed as coming soon. The most surprising part is the support for third-party extensions right out of the gate. Raycast built its reputation on an extensive library of integrations that connect it to services like Slack, YouTube, and GitHub. While some of these extensions are written for specific macOS features and are not compatible, many of the extensions are written in JavaScript, which means a huge number of them already work from day one on Windows. Image: @alvaniss1g on X As you can guess, the beta is not feature-complete. A lot of the heavy-hitting Pro features are still on the roadmap and are promised to be coming soon. This includes Cloud Sync to keep your settings consistent across machines, as well as the much-loved Raycast Notes feature. If you are a Pro user on Mac looking to switch, you may want to wait a little longer for full parity. Snippet expansion and calendar integration are also on the to-do list for future updates. Getting into the beta is a bit of a process. Access is being managed through a gradual rollout, with priority given to users who signed up for the waitlist announced last year. If you are on the list, you can expect to receive an email soon. And if you don't like the waitlist, your best bet is to find a friend who already has an invite code, as each invited user gets a few extra codes to share. Sorry, Linux users, but there is no word on your build at the moment, and we're not holding our breath.
    • We M$ and our 801 loved partner$ are having a data party !
    • Wise Disk Cleaner 11.2.4 by Razvan Serea Wise Disk Cleaner is a free disk utility designed to help you keep your disk clean by deleting any unnecessary files. Usually, these unnecessary, or junk files appear as a result of program's incomplete uninstalls, or Temporary Internet Files. It is best if these files are wiped out from time to time, since they may, at some point, use a considerable amount of space on your drives. Wise Disk Cleaner, with its intuitive and easy to use interface, helps you quickly wipe out all the junk files. Using the program is indeed easy. It also works fast when both scanning for files and deleting files. The new Wise Disk Cleaner has more advantages: improved performance, better interface and scans/cleans more thoroughly. Wise Disk Cleaner Free provides lifetime free update service and Unlimited Free technical support. The first Slimming System software Wise Disk Cleaner is the first system slimming tool, which will help you to remove Windows useless files that you don't need, such as Korean IME, Windows Sample music, videos, pictures, Installers and Uninstallers of Updates Patches etc. Wise Disk Cleaner 11.2.4 Build 844 changelog: Improved cleaning rules for WPS Office and Mozilla Firefox. Added cleaning support for Jetbrains Pycharm, Jetbrains WebStorm, Jitsi, Visual Studio Code, jv16 PowerTools, Kakao Talk, Kingo Root, Launchy, Layers of Fear, LBRY, League of Legends, Leapfrog Connect, and Leawo Prof Media. Fixed minor bugs from the previous version. Download: Wise Disk Cleaner 11.2.4 | 6.9 MB (Freeware) Download: Portable Wise Disk Cleaner 11.2.4 | 7.3 MB View: Wise Disk Cleaner Home Page | Screenshot Get alerted to all of our Software updates on Twitter at @NeowinSoftware
  • Recent Achievements

    • Conversation Starter
      Kavin25 earned a badge
      Conversation Starter
    • One Month Later
      Leonard grant earned a badge
      One Month Later
    • Week One Done
      pcdoctorsnet earned a badge
      Week One Done
    • Rising Star
      Phillip0web went up a rank
      Rising Star
    • One Month Later
      Epaminombas earned a badge
      One Month Later
  • Popular Contributors

    1. 1
      +primortal
      537
    2. 2
      ATLien_0
      205
    3. 3
      +FloatingFatMan
      167
    4. 4
      Michael Scrip
      151
    5. 5
      Som
      127
  • Tell a friend

    Love Neowin? Tell a friend!