2009-02-27

Data Overview 2009

At the start of 2008 I posted a data overview, showing how much things had changed since 2006. This should probably become a yearly tradition, so here's the official 2009 overview:


Not a whole lot of change. So far as I can tell by doing a "blink compare", at this scale the only differences are:
Anything else I can plug in, dear readers?

(I used TortoiseIDiff to do the blink comparison. I used a similar technique while doing the Solomani and Aslan Dotmap Reconstruction.) Ω

2009-02-20

More search tweaks

Okay, a few more tweaks:
  • The handling of multi-word searches is now "Web-like". Multiple words in a search turn into multiple AND clauses. So "so ri" would find Solomani Rim sector.
  • You can specify exact: as a prefix to force an exact name match of that term. So "exact:sol" will find only Sol subsector.
  • You can specify uwp: as a prefix to match a UWP. (Searching only for XXXXXXX-X remains as a shortcut for a UWP search.)
  • If you use wildcards, the "prefix search" functionality is turned off. That is, normally "reg" would match "Regina" and "Beta Regilis". "reg*" will only match names starting with "reg"
And you can combine these, for example:

  • t* uwp:*f - find worlds starting with T with Tech Level F
  • exact:terra uwp:a* - find worlds named Terra with Class A starports
Limitations:
  • Searches are performed on specific item data only, not the context of the item. By that I mean: you can't search on "solomani rim uwp:a*" since it would only search for worlds named "solomani" and "rim".
  • Search results may include alternate languages or spellings that are known to the site metadata but aren't shown. For example, Solomani Rim is known as Kushuggi in Vilani, so "kush*" would find it, although it won't be apparent why.
Ω

2009-02-19

Search: Now with UWPs

Has this ever happened to you?
Damn... where did I leave that world?!?! I know it's UWP is A7899B9-B but I can't for the life of me remember where I left it! Well, I'm not sure... maybe it was B7899B9-C or A7899B9-C... ugh...
If so, your days of trouble are over! You can now search for UWPs. Here's how it works:
  • By default, search still only looks for names (sectors, subsectors, and worlds), matching starts of words (so "sol" matches "Solomani Rim", "Sol", and "Nowa Sol")
  • If your search term has the pattern XXXXXXX-X (7-1) the search looks for exact matches of UWPs instead (not prefixes)
  • You can also make it explicit and use a uwp: prefix on the search to force UWP matching
And for added bonus:
  • Wildcard searches are supported, using either the % or * character. They mean the same - match zero or more characters. So you could search on R*g*a to find Regina. Note that searches have an implicit wildcard at the end, so that's the same as R*g*a*
You can combine wildcards with the uwp: prefix as well. The Search API documentation has been documented to reflect this.
A cookie to the first commenter that finds my missing world!
Ω

2009-02-18

Search: Better, Faster, Stronger... Geekier

I've taken the plunge - the search feature (and back end of the Search API) is now powered by a real honest-to-Turing database engine. Yes... I've entered the 1970s!

If you were playing with the site for the last hour or so, you may have noticed that search went wonky. As usual, code that worked fine in my staging environment (my laptop) ran into scaling/performance issues when in the production environment - 130k separate INSERT statements took longer than ASP.NET likes to wait. A little fiddling with DataTables and SqlBulkCopy and now a rebuild of the database takes under ten seconds. Hooray!

What you'll find:
  • Search is better. Instead of exact word matches it will now do stem matches. So "Sol" will find "Solomani Rim (sector)" and "Sol (subsector)"
  • Search is faster. Previously, the search was done by creating an in-memory hashtable-based index that could be disposed of at ASP.NET's whim. So the first search on a cold index would take >10 seconds; subsequent searches would be faster... until the server decided to reclaim memory. I don't believe my hosting provider executes the site across multiple hosts, but that's now feasible at least.
  • Search is stronger. The database can be rebuilt in a few seconds from the raw data files (and ginormous metadata map) but that will only be necessary on data updates.
Now, all that said...

  • It's not going to give any additional result data back just yet, nor allow searching on other fields. The outstanding request - that should now be easy to service - is to allow searching by UWP. I have the data ready, I just need to write the glue, but I'm out of time for tonight.
  • The results are fairly arbitrary - matching sectors, matching subsectors, then matching worlds - and a max of 20 results total. Previously that wasn't a problem, but with the looser matching it could be less than idea. Feedback?
What else do you want? What problems have you noticed? Ω