Monday, September 12, 2005 7:41 PM
by
marcmill
Indexer Internals: What is the System Schema?
When considering whether to extend the system schema, you should consider whether your extensions are actually necessary given the hundreds of properties that already ship with Windows Vista.
You can view the schema as the indexer understands it by looking at the internal indexer schema configuration file in: %ALLUSERSPROFILE%\appdata\roaming\microsoft\usearch\data\config\schema.txt
Note that the slashes in the property names, i.e. System/Author, will change to dots in coming releases -- so it will be System.Author instead.
The term Schema Proliferation refers to adding unnecessary schema. Schema proliferation leads to bad user experiences, for example stacking in heterogenous views. Let me elaborate: Imagine if FooCorp comes along and introduces their own idea of Author called the FooAuthor. This works okay for scenarios in which I'm only dealing with .foo files. I can stack by FooAuthor in my views, I can query for documents whose FooAuthor is Fred, etc.
But the model breaks down when I am looking at views that contain files of different types other than .foo, because all the other types use System.Author for their author fields and have never heard of FooAuthor. So if I want to assemble a view of all the .doc, .ppt, AND .foo files authored by Fred, I can't do this because the .foo files don't expose an Author property.
Schema proliferation requires a special amount of discipline to avoid. while it's obvious in my FooCorp example, it's not quite as clear in the example of, say, .pgn files (chess game files) in which you could argue that the players of the game are the defacto authors of the file. Or the "chef" of a recipe file is also kind-of the author.