• 0

Determine if string can be split up into words


Question

I need to write an algorithm using dynamic programming to take an input in the form of a string without spaces or punctuation and return

a) if it can be split up into a sequence of words. "themailer" -> true; "thdfapple" -> false

b) the sequence produced. "the", "mailer"

In the dynamic programming approach, you have to find a subproblem... I've tried several with no avail... i feel like I'm closer right now with saying the subproblem is the longest word adjacent to the current position i'm testing

My general approach is to fill a table that is n x n long where n is the length of the string, the horizontal axis is the START of the substring and the vertical axis is the END of the substring (that i am testing)

Can anyone offer some insight / help on doing this? I realize this is highly esoteric and probably not explained very well... Sorry

9 answers to this question

Recommended Posts

  • 0

Hmm, that looks to be quite a challenge. You'd need some sort of dictionary to compare valid words to, but it might get a little interesting when you encounter words which could exist within other words, e.g. "ten" is a valid word, but so is "tenth". Unless you have a defined set of words and your work backwards (longest -> shortest) and break the string down that way? Never done anything like this before, so it would be interesting to see how you do it...

  • 0

If you want to do this with DYNAMIC PROGRAMMING you basically have to keep pointers of where your current words start/end and try to find a matching item. If the last word doesn't fit anywhere then try to move the previous one and so on.

  • Like 1
  • 0
  On 03/03/2010 at 20:38, Argote said:

If you want to do this with DYNAMIC PROGRAMMING you basically have to keep pointers of where your current words start/end and try to find a matching item. If the last word doesn't fit anywhere then try to move the previous one and so on.

That sounds like a good approach! Rather you than me though! Sounds very tricky indeed :rolleyes:

  • 0

The constraint is that it has to be done in O(n^2) time. Not quite sure what you mean Argote, but if your algorithm has to go back, then you might end up with n^n time because for every letter, you can have n mistakes.

I'm glad you seem to be familiar with dynamic programming though. I think I have a solution; perhaps you can verify that what I have runs in O(n^2) and is "dynamic programming"

pseudo python... the check to see if something is a word is considered to take unit time (that's just how it is.)

for start in range(0, length):
     for end in range(start+1, length):
         isWord = #check the current substring based on start and end.
         if isWord: table[start][end] = substring length

This should take n^2 time, according to a source, but it takes a little less since it only covers half of a table

due to the fact that end >= start is a constraint. This basically goes through all the substrings:

[a]bcd

[ab]cd

[abc]d

acd

a[bc]d (etc)

and when it hits one that is a word, it puts the length of that word into that cell (so to get the word, you trace the table UP that many rows)

I then propose to do a DEPTH FIRST SEARCH from the end of this table back, (as if it were a graph).

If I can get to the start state, then I can split the string up (and will have actually built it by then).

Depth First Search is O(|V| + |E|)... |V| = n, and |E| seems to be constrained to be n(n-1)=n^2-n in this case.

You have n nodes, and each node can have n-1 edges since it's basically an undirected graph.

so I have O(n^2 + n^2 -n) = O(n^2) runtime, I THINK....

please feel free to correct me or comment on it. I'm not sure if this is actually dynamic programming since I have to do a multipass (the DFS after I fill in the table)

This is my test string, by the way.... I hammered it out by combining interesting cases... it covers things such as choices like "than" or "thank" or "an"... "mail" or "mailer" or "email"

  thankillthemailer
t 00000000000000000
h 00000000000000000
a 00000000000000000
n 40000000000000000
k 50000000000000000
i 00000000000000000
l 00000000000000000
l 00004300000000000
t 00000000000000000
h 00000000000000000
e 00000000320000000
m 00000000430000000
a 00000000000000000
i 00000000000000000
l 00000000005400000
e 00000000000000000
r 00000000000600000

so the columns represent the START and the rows represent the END from my algorithm, so if you look at table['k','l'], you get a 4, which means that the 'l' is the end of a 4 letter word. (yes, i realize there are two l's, but it's easier to explain that way. i'm obviously talking about the second l)

  • 0

There is a much less complex and more elegant solution.

You propose to finish with a DFS working backwards to locate all solutions (paths through the directed graph back to the start). This itself is very close to the more elegant solution I will attempt to describe. We are indeed going to be starting at the end of the string rather than the beginning.

Let us call a string that has been successfully split into words a "sentence". Consider that every sentence must end with a word. Working from the end of the string forward, we will construct all valid sentences recursively.

Rather than iterating through all end and then all start positions, let us consider our end position static - we simply always end with the end of our "current" string as we are not interested in words ending before the end of our current string because those solutions don't provide a valid word at the end of the sentence. We will thus iterate only the start position, working backwards from end-1 to 0. If we ever get to start = 0 without finding a valid word, then there aren't any valid sentences because there are no valid words that the sentence can end with. We don't need to iterate again with a new end position, that would be wasted computation.

As we work backwards by start position, each time we encounter a valid word we will recurse with the substring preceding the word, and then prepend each result from the recursion to our "current" valid word to generate an array of our valid solution sentences. We don't recurse if start is 0 as we have reached the beginning of the string and can simply add the valid word to our array of solutions and return.

In pseudo-code (I have a running solution in PHP that I can generalize and post if there is interest, though it is much better left as an exercise for the reader)

 function find_sentences(string) {
     sentences = [];
     end = length(string) - 1;
     for (start = end; start >= 0; start--) {
         sublength = end - start + 1;
         substring = substring(string, start, sublength);
         if (is_word(substring)) {
             if (start > 0) {
                 next = substring(string, 0, start);
                 recurse = find_sentences(next);
                 foreach (recurse as result) {
                     sentences[] = result + ' ' + substring;
                 }
             } else {
                 sentences[] = substring;
             }
         }
     }
     return sentences;
 }

Enjoy! Any questions, I'll be happy to answer as best I can.

Also, thanks to ArmedMonkey - when I originally began on a solution for a similar problem, it was your solution that 'tipped' me off to the solution I described above.

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.
  • Posts

    • Microsoft SharePoint gets Modern Page Templates to speed up page creation by Paul Hill SharePoint, Microsoft’s enterprise content management solution that allows organizations to set up internal pages and share documents, has just got a big update with the new Modern Page Templates. Microsoft said that it’s going to be rolling out the feature to customers globally between early July and early August. Organization members that want to set up pages on SharePoint using the new templates can do so from multiple entry points including the Site Home (the home of a Communication or Teams site), App Bar (the persistent navigation pane on the left of SharePoint), and the New Web Part (a webpart that displays and creates news posts directly on pages). By giving colleagues multiple access points to the new templates, there’s more chance they’ll be found and used. By including plenty of swanky templates, Microsoft is making it so you can spend less time designing pages, and more time filling them with content that really matters. If you want to create a new page using the templates, just go to New > Page and select a template “From Microsoft”. If you were making a news post, you would go to New > News post then you’d be presented with the new Template Gallery where you can select a custom template created by your colleagues under “Saved on this site”. While these new templates are coming to customers globally next month, Microsoft is already busy working on new features to make templates easier to discover and make them more useful. For example, it is building tenant-wide custom templates that can be published across sites for organization-wide use, and it’s working on Copilot templates that you can use when creating pages with Copilot. Neither of these features was given a release timeline. Customers won’t need to do anything to start using Modern Page Templates as they’ll be available automatically as it rolls out worldwide across tenants. For those who want to dive deeper, Microsoft will be publishing additional documentation and guidance about this feature soon.
    • Times are changing: https://arstechnica.com/gaming...ndows-11-ars-testing-finds/
    • Unless there is “bug” that all of a sudden sends your messages to Meta. Where have I heard this before?!
    • Maybe, just maybe... and it isn't you... there are some people who like the Windows 11 UI (for whatever reason) and want a better backend.
    • Gundams to arrive in Call of Duty: Mobile with new mech mode and unique third-person combat by Paul Hill Activision has announced that Call of Duty: Mobile will see the launch of Season 6 “Gundams Arrive” on July 2 at 5PM Pacific Time. This major collaboration between the world’s most popular FPS franchise and the Gundam franchise will introduce a new, limited-time mode called Gundam Team Deathmatch where 8 players will face off 4v4 and pilot Gundam-themed operators such as Ethan - Freedom Gundam, Reaper - Sazabi Gundam, Proton - v Gundamo, Deathscythe Gundam (EW). When playing in this game mode, players will notice a switch from the usual first-person view to the third-person perspective. The new game mode will also feature specific abilities and weapons that are unique to Gundam suits rather than player loadouts. The new Gundam Team Deathmatch mode will be played on a new map called interstellar space station, which has been designed for this mode. When playing, you’ll discover that the mech suits offer specialized mobility such as dodge, sprint, and vertical jets. In the post-match, players will be able to watch Gundam operator animations with unlockable rewards for viewing. You can unlock more animations by participating in Gundam Team Deathmatch, normal Multiplayer, and Battle Royale modes. There will also be Gundam-themed in-game events, such as Survival of the Fittest, which will give players free rewards like the new legendary weapon J358 — Fin Funnel v Gundam, Urban Tracker — Defense Force, Cyro Bomb — Haro (reskin), Emote — Haro Team, new camos, and more. Players will also be able to obtain a variety of items through Season 6 Battle Pass free and premium tiers, including sci-fi-themed Operators and Weapon Blueprints. Players on the free tier will get access to the bolt-action 3-Line Rifle based on a World War II design and is capable of inflicting high damage with high accuracy. Free tier players will also have the chance to earn other rewards such as Skins, Weapons Blueprints, Vault Coins, and more. Players looking to spend money can get the Premium Pass. These players will have a chance to get all of the content from Season 6 including tactical warriors like Silver — Chrome Dome Reskin, Misty — Science Pilot, Atlas — Dust Ranger, and The Marshal — Rock Hound; and Weapon Blueprints like the BP50 — Pathripper, Oden — Maevwat Technical, PDW-57 — Rocket Re-Entry, BY15 — Dark Moon, and the 3-Line Rifle — Geo Thermal Line, based on the new Season 6 weapon. There’s also Battle Pass Subscription which gives you additional monthly rewards along with a 10% boost to Player and Weapon XP, discount coupons, and limited discounts on 10x crate pulls. Activision also stated that Mythic Drops are returning to the Mythic store and that Battle Pass Vault is getting Season 9 — Zombies Are Back (2022) and Season 6 — Templar's Oath (2023).
  • Recent Achievements

    • Week One Done
      pcdoctorsnet earned a badge
      Week One Done
    • Rising Star
      Phillip0web went up a rank
      Rising Star
    • One Month Later
      Epaminombas earned a badge
      One Month Later
    • One Year In
      Bert Fershner earned a badge
      One Year In
    • Reacting Well
      ChrisOdinUK earned a badge
      Reacting Well
  • Popular Contributors

    1. 1
      +primortal
      539
    2. 2
      ATLien_0
      205
    3. 3
      +FloatingFatMan
      169
    4. 4
      Michael Scrip
      150
    5. 5
      Som
      126
  • Tell a friend

    Love Neowin? Tell a friend!