Optimize a search in an NSMutableDictionary

A data blender

This post will likely be updated a few times as I learn on the fly and relay experience gained information. If anyone is reading this.

I have a situation where I need to search through an NSMutableDictionary in the hopes of getting a fuzzy match to happen. The NSMutableDictionary in question is populated by items on an iOS device’s onboard music library. Currently I am testing with 1,777 items. That doesn’t seem like a huge amount, but running a traditional for…in loop takes about 6 seconds to run through the whole thing.

Here is my original loop code without surrounding method context (it’s simply lifted out to make things easier):

for (id key in songsDictionary) {
                NSString *thisSong = key;
                int suppliedCount = [stringValue length];
                int keyCount = [thisSong length];
                if(suppliedCount > keyCount){
                    match= [StringDistance stringDistance:thisSong :stringValue];
                } else {
                    match= [StringDistance stringDistance:stringValue :thisSong];
                }
                //Get the very best match there is to be found for song.
                if(match < currentFoundValue){
                    currentFoundValue = match;
                    test = [songsDictionary objectForKey:thisSong];
                    dis = [NSArray arrayWithObject:test];
                    collection = [[MPMediaItemCollection alloc] initWithItems:dis];
                }
            }

This takes about 6 seconds to process and get through. Now, someone suggested I try something else. So I looked at enumerateKeysAndObjectsWithOptions. I had to change my code a little bit (outside the loop) so they were block compliant. I never heard of that before, but basically you just prefix your vars with __block. That’s two underscores and lowercase block. So my MPMediaItem *test became __block MPMediaItem *test. Simple enough when you know about it, I had to Google after the compiler yelled at me, then I figured it out. Anyway, I tried this method and it’s shaved off MAYBE a second from the total time. Not a great optimization, but it’s something.

int suppliedCount = [stringValue length];
            [songsDictionary enumerateKeysAndObjectsWithOptions:NSEnumerationConcurrent usingBlock:^(id key, id obj, BOOL *stop) {
                NSString *thisSong = key;
                int keyCount = [thisSong length];
                if(suppliedCount > keyCount){
                    match = [StringDistance stringDistance:thisSong :stringValue];
                } else {
                    match = [StringDistance stringDistance:stringValue :thisSong];
                }
                if(match < currentFoundValue)
                {
                    currentFoundValue = match;
                    test = [songsDictionary objectForKey:thisSong];
                    dis = [NSArray arrayWithObject:test];
                    collection = [[MPMediaItemCollection alloc] initWithItems:dis];
                }
            }];

Now, someone else told me to try something else too. This part I have not done yet. I will quote them but won’t give named credit as I didn’t ask them permission to use their name.

Simple example. Init a NSMutableDictionary. For each string, compute a hash key as the sum of all chars composing it (in a short or int), divide it by 8 (that’s quick). This way, all the words that differ by just one char being exchanged (metathesis), or look more or less the same, will get an identical hash code.

In your mutable dictionary, use the hash key as the key, the object being an NSMutableArray where you put all the strings that share the same hash key. Then use the contents of this NSMutableArray as proposals for your user. That should be really fast. Refine the process if you need more grain.

You can try to use that (enumerateKeysAndObjectsWithOptions), but, basically, it is the same problem: you enumerate all entries in your dictionary and compute your string distance for each, which is cumbersome. The algorithm I propose you is way faster, because you don’t have to recompute this distance each time you search. But, once again, the crude way to compute a hash code might not suit your needs.

So now I am looking into trying this technique out and see what it produces. It’s going to be a bit of extra code to slap in there, but since I’m rocking SVN I can just blow it out if it doesn’t work correctly.

Update:

I think that I finally understand what that quote above is actually telling me to do. I wasn’t exactly clear before.

  • Compute a key for the song during NSMutableDictionary creation that’s based on the number of characters in the string. This could be an int for example.
  • All songs with the same length of characters in it’s string would be ganged up under the auspices of that key… the object for that key (let’s say for example the key is 20) would be an NSMutableDictionary whose key would be song title and object be an MPMediaItem.
  • So now when searching one doesn’t need to rip through every single item, but intelligently narrow down which (or which few if one employs a close but not exact match allowance) group of potential matches to dig through, saving oodles of time.

Instead of tearing up code to give this a go, I plan on just adding another step in the data generation method and optionally use it’s architecture to see how it speeds things up without degrading accuracy.

Fun stuff :)

Update2: 

That’s not what he was exactly on about. (I need clarification because I don’t quite get the intricacies quite yet).

> a song search for “Cold Wind to Valhalla” … char count of 21, look in the dictionary for key of int 21 (or close), and check it (probably another dictionary) for it’s key of the title to get the MPMediaItem.

Not quite. “Cold Wind…” you add all the ascii codes of the chars – you may want to lowercase the string before, so it is case insensitive, thus you hash key would be :
key = 21 [length of your string] << 24 + (‘c’ + ‘o’ + ‘l’ + ‘d’ + ‘ ‘ + …) % (2^24 – 1).

You then look up all the keys whose upper byte is 21 ((keys >> 24) == 21) and then selected the one whose hash key (hashkey & 0x00ffffff) is the nearest to (key & 0x00ffffff).

Update 2:

Wow did I get a lot of help. After a lot of massaging and patting, the search that took 6 seconds each time now takes 500ms to 1000ms. That’s a HUGE improvement!!! I don’t have the time to post up how it works just yet, but it’s a great solution. Thank you Vincent!

Related Posts Plugin for WordPress, Blogger...

Leave a Reply

Your email address will not be published. Required fields are marked *


3 × = six

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>