• 0

[PROBLEM] In need of an algorithm


Question

Fellow neowinians, I have a problem. I need to make a program in C to search through either a database or a file. However, since this database/file will have thousands of entries, I cannot simply use a for loop. The user enters a search phrase of up to 100 characters. Then I need to break this phrase up into groups of three characters and search every single entry in the database/file for each of those groups. Then, I need to break it up in 4 and do the same thing. Then 5. Then 6. Until the entire phrase is one group. The entries have a forename and a lastname, however, each of those may have one or more words in them. Could someone please shed some light on me. I have absolutely no idea how to do this. Thx in advance. BTW, if you need more info, please PM me or send me an email super.serge@gmail.com

Link to comment
Share on other sites

14 answers to this question

Recommended Posts

  • 0

Why not just search the one phrase to begin with?

If this is a school project, which I would venture a guess that it is a school project. Speed should not be a primary concern and unless the project specifcally states to not brute force it, I would.

Otherwise, I suggest loading the database/file into a btree. Running from memory and having a sorted data source will provide incredible speed; this is perhaps not the quickest method, but very fast all the same. No matter what, to achieve a great deal of speed, you must parse the text from memory.

Link to comment
Share on other sites

  • 0

Well, its not for a school project, as I am only 14 and don't learn programming in school yet. And yes, speed is a primary concern because if a user is searching for something, and it takes 15 minutes to give the results of the search, then the program isn't very effective. A friend told me to use a binary search. However, I am not sure how I would put that to use if I needed more than one result displayed. I.e., if I entered ken lon, I would want it to display Kenny Long and Ikenson (assuming that I had those names in the database/file).

Link to comment
Share on other sites

  • 0

1) A flat file that is 50MB can be parsed in about 3 seconds (depending a little on drive speed). This can give you an idea of the actual time required.

2) If you want speed and simplicity, set up a real database (MySQL or even Access would work well for the size you are using) and make queries to the database.

3) What environment are you using (Visual C++, etc.)? Does this application have a GUI or is it command-line?

4) Instead of searching in character groups, why not group based on spaces/end-of-line? Example, Ken Long Short would search for ken, long, and short and return all matching entries/lines. This would eliminate your looped routine (you would need a looped routine to go groups of 3, 4, 5, 6, etc.). You would only have to read through the data once (if line contains all search terms then diplay line).

Link to comment
Share on other sites

  • 0

IDGAF is correct, you should just load the information into a database and use SQL. I don't understand your search method what kind of data are you searching and what are you searching for?

Link to comment
Share on other sites

  • 0

Well, actually, the speed is a factor, because once I finish it in C, I will be transferring it to php, where it is very noticeably slower. Second, I was going to set up a database, but for now I am jsut using a file because its smaller and easier to manage (I think :p) Third, I am using Visual C++ (until i move it to php) and it is command line, because I have no use for it as an app, I just need the code and the speed. Fourth, I want to search in groups because not doing so would rely on user perfect data. Unfortunately, this is not always the case, so I have to be prepared. Although I might switch to that if I feel that it is going way too slow.

Link to comment
Share on other sites

  • 0

Why not do it with PHP to begin with and use a MySQL backend? You'll save yourself significant development time and the MySQL DB will handle the search routine for you.

This is how I would do it. There are great resources on the net for learning more about both. Start on MySQL's website.

Link to comment
Share on other sites

  • 0
Second, I was going to set up a database, but for now I am jsut using a file because its smaller and easier to manage (I think :p)

You think wrong.... with the sort of query you want to do a database engine is needed....

Then you can build a more complicated query using string manipulation within C (or PHP) to search for all the combinations necessarry.

And what about issues of concurrent access on the file? use a database (even access)... there's no point re-inventing the wheel on this one.

Good luck....

Link to comment
Share on other sites

  • 0

or now that I think of it... how about an XML file and XPath Queries?

Link to comment
Share on other sites

  • 0
or now that I think of it... how about an XML file and XPath Queries?

Aren't XPath queries slow with lots of data? I haven't seen an XSL engine that can compare to a DB. I think he's better off going the database route. Properly indexed tables will kick butt in a query.

Link to comment
Share on other sites

  • 0

All right, don't worry about the way Ill store the data for now. I simplified what I wanted to say tonumbers, and I need you guys to tell me what the fastest way of doing it is:

Find all occurances(sp?) of 34 in the following:

4589
2356
189645
11447
1268
06987
43456124
913451051
452671201
4564
1212
6578

Imagine that the list of numbers go on, but for sake of simplicity, I limited...btw, all numbers were completely random from the top row of the keyboard, so theyre not rigged in any way. All I want is to know what the fastest way to find all of the occurances would be. Nothing else :p . Oh, and another thing, think of the numbers as chars rather than numbers... :whistle:

Link to comment
Share on other sites

  • 0

SQL Query:

select * from <table name> where <column_name> like '%34%'

And let the database worry about the rest of the work ;)

Link to comment
Share on other sites

  • 0

true... how about serial file access and a regular expression?

Link to comment
Share on other sites

  • 0

hmmmm...I thought of a way, but the database might get kinda big (possibly). And I don't think I'll be able to make so that it searches for parts of words, only for full words. So, in the previous example with the numbers, the list of numbers would all possibly be separated and the 34 would have to be separated from the rest of the numbers for the search to return any results. If I was to do it that way, I could record every word found in the string in a database, and then put the name in the same row. That way, wehnever any of the words are searched for, only the names that have the words searched for are even checked. This would greatly narrow the amount of searching down to only the strings that need to be searched.

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.