The PDC has happened, which means two things. I
can post some of my (slightly self-censored) reactions to the show, and I can talk
about what we ve disclosed about Whidbey and Longhorn more freely. In
this particular case, I had promised to talk about the deep changes we re making
in Whidbey to allow you to host the CLR in your process. As
you ll see, I got side tracked and ended up discussing Application Compatibility
instead.
But first, my impressions of the PDC:
The first keynote, with Bill, Jim
& Longhorn, was guaranteed to be good. It had all the coolness of Avalon,
WinFS and Indigo, so of course it was impressive. In fact, throughout all the
sessions I attended, I was surprised by the apparent polish
and maturity of Longhorn. In my opinion, Avalon looked like it is the most mature
and settled. Indigo also looked surprisingly real. WinFS looked good in
the keynote, where it was all about the justification for the technology. But
in the drill-down sessions, I had the sense that it s not as far along as the others.
Hopefully all the attendees realize
that Longhorn is still a long way off. It
s hard to see from the demos, but a lot of fundamental design issues and huge missing
pieces remain.
Incidentally, I still can t believe
that we picked WinFX to describe the extended managed frameworks and WinFS to describe
the new storage system. One of those
names has got to go.
I was worried that the Whidbey keynote
on Tuesday would appear mundane and old-fashioned by comparison. But to an audience
of developers, Eric's keynote looked very good indeed. Visual Studio looked
better than I've ever seen it. The device app was so easy to write that I feel
I could build a FedEx-style package tracking application in a weekend.
The
high
point
of this keynote was ASP.NET. I hadn't been paying attention to what they've
done recently, so I was blown away by the personalization system and by the user-customizable
web pages. If I had seen a site like that, I would have assumed the author spent
weeks getting it to work properly. It
s hard to believe this can all be done with drag-and-drop.
In V1, ASP.NET hit a home run by focusing
like a laser beam on the developer experience. Everyone put so much effort into
building apps, questioning why each step was necessary, and refining the process.
It's great to see that they continue to follow that same discipline. In the
drill-down sessions, over and over again I saw that focus resulting in a near perfect
experience for developers. There are
some other teams, like Avalon, that seem to have a similar religion and are obtaining
similar results. (Though Avalon desperately
needs some tools support. Notepad is
fine for authoring XAML in demos, but I wouldn t want to build a real application
this way).
Compared to ASP.NET, some other teams
at Microsoft are still living in the Stone Age. Those
teams are still on a traditional cycle of building features, waiting for customers
to build applications with those features, and then incorporating any feedback. Beta
is way too late to find out that the programming model is clumsy. We
shouldn t be shirking our design responsibilities like this.
Anyway, the 3rd keynote (from Rick
Rashid & Microsoft Research) should have pulled it all together. I think
the clear message should have been something like:
Whidbey
is coming next and has great developer features. After that, Longhorn will arrive
and will change everything. Fortunately, Microsoft Research is looking 10+ years
out, so you can be sure we will increasingly drive the whole industry.
This should have been an easy story
to tell. The fact is that MSR is a world class research institution. Browse
the Projects, Topics or People categories at http://research.microsoft.com and
you ll see many name brand researchers like Butler Lampson and Jim Gray. You
will see tremendous breadth on the areas under research, from pure math and algorithms
to speech, graphics and natural language. There
are even some esoterica like nanotech and quantum computing. We
should have used the number of published papers and other measurements to compare
MSR with other research groups in the software industry, and with major research universities. And
then we should have shown some whiz-bang demos of about 2 minutes each.
Unfortunately, I think instead we
sent a message that Interesting technology comes from Microsoft product groups,
while MSR is largely irrelevant. Yet
nothing could be further from the truth. Even
if I restrict consideration to the CLR, MSR has had a big impact. Generics
is one of the biggest feature added to the CLR, C# or the base Frameworks in Whidbey. This
feature was added to the CLR by MSR team members, who now know at least as much about
our code base as we do. All the CLR
s plans for significantly improved code quality and portable compilers depend on a
joint venture between MSR and the compiler teams. To
my knowledge, MSR has used the CLR to experiment with fun things like transparent
distribution, reorganizing objects based on locality, techniques for avoiding security
stack crawls, interesting approaches to concurrency, and more. SPOT
(Smart Object Personal Technology) is a wonderful example of what MSR has done with
the CLR s basic IL and metadata design, eventually leading to a very cool product.
In my opinion, Microsoft Research
strikes a great balance between long term speculative experimentation and medium term
product-oriented improvements. I wish
this had come across better at the PDC.
Trends
In the 6+ years I ve been at Microsoft,
we ve had 4 PDCs. This is the first
one I ve actually attended, because I usually have overdue work items or too many
bugs. (I ve missed all 6 of our mandatory
company meetings for the same reason). So
I really don t have a basis for comparison.
I guess I had expected to be beaten
up about all the security issues of the last year, like Slammer and Blaster.
And I had expected developers to be interested in all aspects of security. Instead,
the only times the topic came up in my discussions is when I raised it.
However, some of my co-workers did
see a distinct change in the level of interest in security. For
example, Sebastian Lange and Ivan Medvedev gave a talk on managed security to an audience
of 700-800. They reported a real upswing
in awareness and knowledge on the part of all PDC attendees.
But consider a talk I attended on
Application Compatibility. At a time
when most talks were overflowing into the hallways, this talk filled less than 50
seats of a 500 to 1000 seat meeting room. I
know that AppCompat is critically important to IT. And
it s a source of friction for the entire industry, since everyone is reluctant to
upgrade for fear of breaking something. But
for most developers this is all so boring compared to the cool visual effects we can
achieve with a few lines of XAML.
Despite a trend to increased interest
in security on the part of developers, I suspect that security remains more of an
IT operations concern than it does a developer concern. And
although the events of the last year or two have got more developers excited about
security (including me!), I doubt that we will ever get developers excited about more
mundane topics like versioning, admin or compatibility. This
latter stuff is dead boring.
That doesn t mean that the industry
is doomed. Instead, it means that modern
applications must obtain strong versioning, compatibility and security guarantees
by default rather than through deep developer involvement. Fortunately,
this is entirely in keeping with our long term goals for managed code.
With the first release of the CLR,
the guarantees for managed applications were quite limited. We
guaranteed memory safety through an accurate garbage collector, type safety through
verification, binding safety through strong names, and security through CAS. (However,
I think we would all agree that our current support for CAS still involves far too
much developer effort and not enough automated guarantees. Our
security team has some great long-term ideas for addressing this.)
More importantly, we expressed programs
through metadata and IL, so that we could expand the benefits of reasoning about these
programs over time. And we provided metadata
extensibility in the form of Custom Attributes and Custom Signature Modifiers, so
that others could add to the capabilities of the managed environment without depending
on the CLR team s schedule.
FxCop (http://www.gotdotnet.com/team/fxcop/)
is an obvious example of how we can benefit from this ability to reason about programs. All
teams developing managed code at Microsoft are religious about incorporating this
tool into their build process. And since
FxCop supports adding custom rules, we have added a large number of Microsoft-specific
or product-specific checks.
Churn and Application Breakage
We also have some internal tools that
allow us to compare different versions of assemblies so we can discover inadvertent
breaking changes. Frankly, these tools
are still maturing. Even in the
Everett
timeframe, they did a good job of blatant violations like the removal of a public
method from a class or addition of a method to an interface. But
they didn t catch changes in serialization format, or changes to representation after
marshaling through PInvoke or COM Interop. As
a result, we shipped some unintentional breaking changes in
Everett
, and until recently we were on a path to do so again in Whidbey.
As far as I know, these tools still
don t track changes to CAS constructs, internal dependency graphs, thread-safety
expectations, exception flow (including a static replacement for the checked exceptions
feature), reliability contracts, or other aspects of execution. Some
of these checks will probably be added over time, perhaps by adding additional metadata
to assemblies to reveal the developer s intentions and to make automated validation
more tractable. Other checks seem like
research projects or are more appropriate for dynamic tools rather than static tools. It
s very encouraging to see teams inside and outside of Microsoft working on this.
I expect that all developers will
eventually have access to these or similar tools from Microsoft or 3rd parties,
which can be incorporated into our build processes the way FxCop has been.
Sometimes applications break when
their dependencies are upgraded to new versions. The
classic example of this is Win95 applications which broke when the operating system
was upgraded to WinXP. Sometimes this
is because the new versions have made breaking changes to APIs. But
sometimes it s because things are just different . The
classic case here is where a test case runs perfectly on a developer s machine, but
fails intermittently in the test lab or out in the field. The
difference in environment might be obvious, like a single processor box vs. an 8-way. Yet
all too often it s something truly subtle, like a DLL relocating when it misses its
preferred address, or the order of DllMain notifications on a DLL_THREAD_ATTACH. In
those cases, the change in environment is not the culprit. Instead,
the environmental change has finally revealed an underlying bug or fragility in the
application that may have been lying dormant for years.
The managed environment eliminates
a number of common fragilities, like the double-free of memory blocks or the use of
a file handle or Event that has already been closed. But
it certainly doesn t guarantee that a multi-threaded program which appears to run
correctly on a single processor will also execute without race conditions on a 32-way
NUMA box. The author of the program must
use techniques like code reviews, proof tools and stress testing to ensure that his
code is thread-safe.
The situation that worries me the most is when an application
relies on accidents of current FX and CLR implementations. These
dependencies can be exceedingly subtle.
Here are some examples of breakage that we have encountered,
listed in the random order they occur to me:
-
Between V1.1 and Whidbey, the implementation of reflection
has undergone a major overhaul to improve access times and memory footprint. One
consequence is that the order of members returned from APIs like Type.GetMethods has
changed. The old order was never documented
or guaranteed, but we ve found programs including our own tests which assumed
stability here.
-
Structs and classes can specify Sequential, Explicit
or AutoLayout. In the case of AutoLayout,
the CLR is free to place members in any order it chooses. Except
for alignment packing and the way we chunk our GC references, our layout here is currently
quite predictable. But in the future
we hope to use access patterns to guide our layout for increased locality. Any
applications that predict the layout of AutoLayout structs and classes via unsafe
coding techniques are at risk if we pursue that optimization.
-
Today, finalization occurs on a single Finalizer thread. For
scalability and robustness reasons, this is likely to change at some point. Also,
the GC already perturbs the order of finalization. For
instance, a collection can cause a generation boundary to intervene between two instances
that are normally allocated consecutively. Within
a given process run, there will likely be some variation in finalization sequence. But
for two objects that are allocated consecutively by a single thread, there is a high
likelihood of predictable ordering. And
we all know how easy it is to make assumptions about this sort of thing in our code.
-
In an earlier blog (http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/e55664b4-6471-48b9-b360-f0fa27ab6cc0),
I talked about some of the circumstances that impact when the JIT will stop reporting
a reference to the GC. These include
inlining decisions, register allocation, and obvious differences like X86 vs. AMD64
vs. IA64. Clearly we want the freedom
to chase better code quality with JIT compilers and NGEN compilers in ways that will
substantially change these factors. Just
yesterday an internal team reported a GC bug on multi-processor machines only
that we quickly traced to confusion over lifetime rules and bad practice in the application. One
finalizable object was accessing some state in another finalizable object, in the
expectation that the first object was live because it was the this argument
of an active method call.
-
During V1.1 Beta testing, a customer complained about
an application we had broken. This application
contained unmanaged code that reached back into its caller s stack to retrieve a
GCHandle value at an offset that had been empirically discovered. The
unmanaged code then transitioned into managed and redeemed the supposed handle value
for the object it referenced. This usually
worked, though it was clearly dependent on filthy implementation details. Unfortunately,
the System.EnterpriseServices pathways leading to the unmanaged application were somewhat
variable. Under certain circumstances,
the stack was not what the unmanaged code predicted. In
V1, the value at the predicted spot was always a 0 and the redemption attempt failed
cleanly. In V1.1, the value at that stack
location was an unrelated garbage value. The
consequence was a crash inside mscorwks.dll and Fail Fast termination of the process.
-
In V1 and V1.1, Object.GetHashCode() can be used to obtain
a hashcode for any object. However, our
implementation happened to return values which tended to be small ascending integers. Furthermore,
these values happened to be unique across all reachable instances that were hashed
in this manner. In other words, these
values were really object identifiers or OIDs. Unfortunately,
this implementation was a scalability killer for server applications running on multi-processor
boxes. So in Whidbey Object.GetHashCode()
is now all we ever promised it would be: an integer with reasonable distribution but
no uniqueness guarantees. It s a great
value for use in HashTables, but it s sure to disappoint some existing managed applications
that relied on uniqueness.
-
In V1 and V1.1, all string literals are Interned as described
in http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/7943b9be-cca9-41e1-8a83-3d7a0dbba270. I
noted there that it is a mistake to depend on Interning across assemblies. That
s because the other assembly might start to compose a String value which it originally
specified as a literal. In Whidbey, assemblies
can opt-in or opt-out of our Interning behavior. This
new freedom is motivated by a desire to support faster loading of assemblies (particularly
assemblies that have been NGEN ed). We
ve seen some tests fail as a result.
-
I ve seen some external developers use a very fragile
technique based on their examination of Rotor sources. They
navigate through one of System.Threading.Thread s private fields (DONT_USE_InternalThread)
to an internal unmanaged CLR data structure that represents a running managed thread. From
there, they can pluck interesting information like the Thread::ThreadState bit field. None
of these data structures are part of our contract with managed applications and all
of them are sure to change in future releases. The
only reason the ThreadState field is at a stable offset in our internal Thread struct
today is that its frequency of access merits putting it near the top of the struct
for good cache-line filling behavior.
-
Reflection allows highly privileged code to access private
members of arbitrary types. I am aware
of dozens of teams inside and outside of Microsoft which rely on this mechanism for
shipping products. Some of these uses
are entirely justified, like the way Serialization accesses private state that the
type author marked as [Serializable()]. Many
other uses rather questionable, and a few are truly heinous. Taken
to the extreme, this technique converts every internal implementation detail into
a publicly exposed API, with the obvious consequences for evolution and application
compatibility.
-
Assembly loading and type resolution can happen on very
different schedules, depending on how your application is running. We
ve seen applications that misbehave based on NGEN vs. JIT, domain-neutral vs. per-domain
loading, and the degree to which the JIT inlines methods. For
example, one application created an AppDomain and started running code in it. That
code subsequently modified the private application directory and then attempted to
load an assembly from that directory. Of
course, because of inlining the JIT had already attempted to load the assembly with
the original application directory and had failed. The
correct solution here is to disallow any changes to an AppDomain s application directory
after code starts executing inside that AppDomain. This
directory should only be modifiable during the initialization of the AppDomain.
-
In prior blogs, I ve talked about unhandled exceptions
and the CLR s default policy for dealing with them. That
policy is quite involved and hard to defend. One
aspect of it is that exceptions that escape the Finalizer thread or any ThreadPool
threads are swallowed. This keeps the
process running, but it often leaves the application in an inconsistent state. For
example, locks may not have been released by the thread that took the exception, leading
to subsequent hangs. Now that the technology
for reporting process crashes via Watson dumps is maturing, we really want to change
our default policy for unhandled exceptions so that we Fail Fast with a process crash
and a Watson upload. However, any change
to this policy will undoubtedly cause many existing applications to stop working.
-
Despite the flexibility of CAS, most applications still
run with Full Trust. I truly believe
that this will change over time. For
example, in Whidbey we will have ClickOnce permission elevation and in Longhorn we
will deliver the Secure Execution Environment or SEE. Both
of these features were discussed at the PDC. When
we have substantial code executing in partial trust, we re going to see some unfortunate
surprises. For example, consider message
pumping. If a Single Threaded Apartment
thread has some partial trust code on its stack when it blocks (e.g. Monitor.Enter
on a contentious monitor), then we will pump messages on that thread while it is blocked. If
the dispatching of a message requires a stack walk to satisfy a security Full Demand,
then the partially trusted code further back on the stack may trigger a security exception. Another
example is related to class constructors. As
you probably know, .cctor methods execute on the first thread that needs access to
a class in a particular AppDomain. If
the .cctor must satisfy a security demand, the success of the .cctor now depends on
the accident of what other code is active on the thread s stack. Along
the same lines, the .cctor method may fail if there is insufficient stack space left
on the thread that happens to execute it. These
are all well understood problems and we have plans for fixing them. But
the fixes will necessarily change observable behavior for a class of applications.
I could fill a lot more pages with this sort of list. And
our platform is still in its infancy. Anyway,
one clear message from all this is that things will change and then applications will
break.
But can we categorize these failures and make some sense
of it all? For each failure, we need
to decide whether the platform or the application is at fault for each case. And
then we need to identify some rules or mechanisms that can avoid these failures or
mitigate them. I see four categories.
Category
1: The application explicitly screws
itself
The easiest category to dispense with is the one where
a developer intentionally and explicitly takes advantage of a behavior that s/he knows
is guaranteed to change. A perfect example
of this is #8 above. Anyone who navigates
through private members to unmanaged internal data structures is setting himself up
for problems in future versions. The
responsibility (or irresponsibility in this case) lies with the application. In
my opinion, the platform should have no obligations.
But consider #5 above. It
s clearly in this same category, and yet opinions on our larger team were quite divided
on whether we needed to fix the problem. I
spoke to a number of people who definitely understood the incredible difficulty of
keeping this application running on new versions of the CLR and EnterpriseServices. But
they consistently argued that the operating system has traditionally held itself to
this sort of compatibility bar, that this is one of the reasons for Windows ubiquity,
and that the managed platform must similarly step up.
Also, we have to be realistic here. If
a customer issue like this involves one of our largest accounts, or has been escalated
through a very senior executive (a surprising number seem to reach Steve Ballmer),
then we re going to pull out all the stops on a fix or a temporary workaround.
In many cases, our side-by-side support is an adequate
and simple solution. Customers can continue
to run problematic applications on their old bits, even though a new version of these
bits has also been installed. For instance,
the config file for an application can specify an old version of the CLR. Or
binding redirects could roll back a specific assembly. But
this technique falls apart if the application is actually an add-in that is dynamically
loaded into a process like Internet Explorer or SQL Server. It
s unrealistic to lock back the entire managed stack inside Internet Explorer (possibly
preventing newer applications that use generics or other Whidbey features from running
there), just so older questionable applications can keep running.
It s possible that we could provide lock back at finer-grained
scopes than the process scope in future versions of the CLR. Indeed,
this is one of the areas being explored by our versioning team.
Anyway, if we were under sufficient pressure I could
imagine us building a one-time QFE (patch) for an important customer in this category,
to help them transition to a newer version and more maintainable programming techniques. But
if you aren t a Fortune 100 company or Steve Ballmer s brother-in-law, I personally
hope we would be allowed to ignore any of your applications that are in this category.
Category
2: The platform explicitly screws the
application
I would put #6, #7 and #11 above in a separate category. Here,
the platform team wants to make an intentional breaking change for some valid reason
like performance or reliability. In fact,
#10 above is a very special case of this category. In
#10, we would like to break compatibility in Whidbey so that we can provide a stronger
model that can avoid subsequent compatibility breakage. It
s a paradoxical notion that we should break compatibility now so we can increase future
compatibility, but the approach really is sensible.
Anyway, if the platform makes a conscious decision to
break compatibility to achieve some greater goal, then the platform is responsible
for mitigation. At a minimum, we should
provide a way for broken applications to obtain the old behavior, at least for some
transition period. We have a few choices
in how to do this, and we re likely to pick one based on engineering feasibility,
the impact of a breakage, the likelihood of a breakage, and schedule pressure:
-
Rely on side-by-side and explicit administrator intervention. In
other words, the admin notices the application no longer works after a platform upgrade,
so s/he authors a config file to lock the application back to the old platform bits. This
approach is problematic because it requires a human being to diagnose a problem and
intervene. Also, it has the problems
I already mentioned with using side-by-side on processes like Internet Explorer or
SQL Server.
-
For some changes, it shouldn t be necessary to lock
back the entire platform stack. Indeed,
for many changes the platform could simultaneously support the old and new behaviors. If
we change our default policy for dealing with unhandled exceptions, we should definitely
retain the old policy& at least for one release cycle.
-
If we expect a significant percentage of applications
to break when we make a change, we should consider an opt-in policy for that change. This
eliminates the breakage and the human involvement in a fix. In
the case of String Interning, we require each assembly to opt-in to the new non-intern
ed behavior.
-
In some cases, we ve toyed with the idea of having the
opt-in be implicit with a recompile. The
logic here is that when an application is recompiled against new platform bits, it
is presumably also tested against those new bits. The
developer, rather than the admin, will deal with any compatibility issues that arise. We
re well set up for this, since managed assemblies contain metadata giving us the version
numbers of the CLR and the dependent assemblies they were compiled against. Unfortunately,
execution models like ASP.NET work against us here. As
you know, ASP.NET pages are recompiled automatically by the system based on dependency
changes. There is no developer available
when this happens.
Windows
Shimming
Before we look at the next two categories of AppCompat
failure, it s worth taking a very quick look at one of the techniques that the operating
system has traditionally used to deal with these issues. Windows
has an AppCompat team which has built something called a shimming engine.
Consider what happened when the company tried to move
consumers from Win95/Win98/WinMe over to WinXP. They
discovered a large number of programs which used the GetVersion or the preferred GetVersionEx
APIs in such a way that the programs refused to run on NT-based systems.
In fact, WinXP did such a good job of achieving compatibility
with Win9X systems that in many cases the only reason
the application wouldn t run was the version check that the program made at start
up. The fix was to change GetVersion
or GetVersionEx to lie about the version number of the current operating system. Of
course, this lie should only be told to programs that need the lie in order to work
properly.
I ve heard that this shim which lies about the operating
system version is the most commonly applied shim we have. As
I understand it, at process launch the shimming engine tries to match the current
process against any entries in its database. This
match could be based on the name, timestamp or size of the EXE, or of other files
found relative to that EXE like a BMP for the splash screen in a subdirectory. The
entry in the database lists any shims that should be applied to the process, like
the one that lies about the version. The
shimming engine typically bashes the IAT (import address table) of a DLL or EXE in
the process, so that its imports are bound to the shim rather than to the normal export
(e.g. Kernel32!GetVersionEx). In addition,
the shimming engine has other tricks it perform less frequently, like wrapping COM
objects up with intercepting proxies.
It s easy to see how this infrastructure can allow applications
for Win95 to execute on WinXP. However,
this approach has some drawbacks. First,
it s rather labor-intensive. Someone
has to debug the application, determine which shims will fix it, and then craft some
suitable matching criteria that will identify this application in the shimming database. If
an appropriate shim doesn t already exist, it must be built.
In the best case, the application has some commercial
significance and Microsoft has done all the testing and shimming. But
if the application is a line of business application that was created in a particular
company s IT department, Microsoft will never get its hands on it. I
ve heard we re now allowing sophisticated IT departments to set up their own shimming
databases for their own applications but this only allows them to apply existing
shims to their applications.
And from my skewed point of view the worst part of
all this is that it really won t work for managed applications. For
managed apps, binding is achieved through strong names, Fusion and the CLR loader. Binding
is practically never achieved through DLL imports.
So it s instructive to look at some of the techniques
the operating system has traditionally used. But
those techniques don t necessarily apply directly to our new problems.
Anyway, back to our categories&
Category
3: The application accidentally screws
itself
Category
4: The platform accidentally screws the
application
Frankly, I m having trouble distinguishing these two
cases. They are clearly distinct categories,
but it s a judgment call where to draw the line. The
common theme here is that the platform has accidentally exposed some consistent behavior
which is not actually a guaranteed contract. The
application implicitly acquires a dependency on this consistent behavior, and is broken
when the consistency is later lost.
In the nirvana of some future fully managed execution
environment, the platform and tools would never expose consistent behavior unless
it was part of a guarantee. Let s look
at some examples and see how practical this is.
In example #1 above, reflection used to deliver members
in a stable order. In Whidbey, that order
changes. In hindsight, there s a simple
solution here. V1 of the product could
have contained a testing mode that randomized the returned order. This
would have exposed the developer to our actual guarantees, rather than to a stronger
accidental consistency. Within the CLR,
we ve used this sort of technique to force us down code paths that otherwise wouldn
t be exercised. For example, developers
on the CLR team all use NT-based (Unicode) systems and avoid Win9X (Ansi) systems. So
our Win9X Ansi/Unicode wrappers wouldn t typically get tested by developers. To
address this, our checked/debug CLR build originally considered the day of the week
and used Ansi code paths every other day. But
imagine chasing a bug at
11:55 PM
. When the bug magically disappears on
your next run at
1:03 AM
the next morning, you are far too frazzled to think clearly about the reason. Today,
we tend to use low order bits in the size of an image like mscorwks.dll or the assembly
being tested, so our randomization is now more friendly to testing.
In example #2 above, you could imagine a similar perturbation
on our AutoLayout algorithms when executing a debug version of an application, or
when launched from inside a tool like Visual Studio.
For example #4, the CLR already has internal stress modes
that force different and aggressive GC schedules. These
can guarantee compaction to increase the likelihood of detecting stale references. They
can perform extensive checks of the integrity of the heap, to ensure that the write
barrier and other mechanisms are effective. And
they can ensure that every instruction of JITted managed code that can synchronize
with the GC will synchronize with the GC. I
suspect that these modes would do a partial job of eradicating assumptions about lifetimes
reported by the JIT. However, we will
remain exposed to significantly different code generators (like Rotor s FJIT) or
execution on significantly different architectures (like CPUs with dramatically more
registers).
In contrast with the above difficulty, it s easy to
imagine adding a new GC stress mode that perturbs the finalization queues, to uncover
any hidden assumptions about finalization order. This
would address example #3.
Customer Debug Probes, AppVerifier and other
tools
It turns out that the CLR already has a partial mechanism
for enabling perturbation during testing and removing it on deployed applications. This
mechanism is the Customer Debug Probes feature that we shipped in V1.1. Adam
Nathan s excellent blog site has a series of articles on CDPs, which are collected
together at http://blogs.gotdotnet.com/anathan/CategoryView.aspx/Debugging. The
original goal of CDPs was to counteract the black box nature of debugging certain
failures of managed applications, like corruptions of the GC heap or crashes due to
incorrect marshaling directives. These
probes can automatically diagnose common application errors, like failing to keep
a marshaled delegate rooted so it won t be collected. This
approach is so much easier than wading through dynamically generated code without
symbols, because we tell you exactly where your bugs are. But
we re now realizing that we can also use CDPs to increase the future compatibility
of managed applications if we can perturb current behavior that is likely to change
in the future.
Unfortunately, example #6 from above reveals a major
drawback with the technique of perturbation. When
we built the original implementation of Object.GetHashCode, we simply never considered
the difference between what we wanted to guarantee (hashing) and what we actually
delivered (OIDs). In hindsight, it is
obvious. But I m not convinced that
we aren t falling into similar traps in our new features. We
might be a little smarter than we were five years ago, but only a little.
Example #10 worries me for similar reasons. I
just don t think we were smart enough to predict that changing the binding configuration
of an AppDomain after starting to execute code in that AppDomain would be so fragile. When
a developer delivers a feature, s/he needs to consider security, thread-safety, programming
model, key invariants of the code base like GC reporting, correctness, and so many
other aspects. It would be amazing if
a developer consistently nailed each of these aspects for every new feature. We
re kidding ourselves if we think that evolution and unintentional implicit contracts
will get adequate developer attention on every new feature.
Even if we had perfect foresight and sufficient resources
to add perturbation for all operations, we would still have a major problem. We
can t necessarily rely on 3rd party developers to test their applications
with perturbation enabled. Consider the
unmanaged AppVerifier experience.
The operating system has traditionally offered a dynamic
testing tool called AppVerifier which can diagnose many common unmanaged application
bugs. For example, thanks to uploads
of Watson process dumps from the field, most unmanaged application crashes can now
be attributed to incorrect usage of dynamically allocated memory. Yet
AppVerifier can use techniques like placing each allocation in its own page or leaving
pages unmapped after release, to deterministically catch overruns, double frees, and
reads or writes of freed memory.
In other words, there is hard evidence that if every
unmanaged application had just used the memory checking support of AppVerifier, then
two out of every three application crashes would be eliminated. Clearly
this didn t happen.
Of course, AppVerifier can diagnose far more than just
memory problems. And it s very easy
and convenient to use.
Since testing with AppVerifier is part of the Windows
Logo compliance program, you would expect that it s used fairly rigorously by ISVs. And,
given its utility, you would expect that most IT organizations would use this tool
for their internal applications. Unfortunately,
this isn t the case. Many applications
submitted for the Windows Logo actually fail to launch under AppVerifier. In
other words, they violate at least one of the rules before they finish initializing.
The Windows AppCompat team recognizes that proactive
tools like AppVerifier are so much better than reactive mitigation like shimming broken
applications out in the field. That
s why they made the AppVerifier tool a major focus of their poorly attended Application
Compatibility talk that I sat in on at the PDC. (Aha! I
really was going somewhere with all this.)
There s got to be a reason why developers don t use
such a valuable tool. In my opinion,
the reason is that AppVerifier is not integrated into Visual Studio. If
the Debug Properties in VS allowed you to enable AppVerifier and CDP checks, we would
have much better uptake. And if an integrated
project system and test system could monitor code coverage numbers, and suggest particular
test runs with particular probes enabled, we would be approaching nirvana.
Winding Down
Looking at development within Microsoft, one trend is
very clear: Automated tools and processes
are a wonderful supplement for human developers. Whether
we re talking about security, reliability, performance, application compatibility
or any other measure of software quality, we re now seeing that static and dynamic
analysis tools can give us guarantees that we will never obtain from human beings. Bill
Gates touched on this during his PDC keynote, when he described our new tools for
statically verifying device driver correctness, for some definition of correctness.
This trend was very clear to me during the weeks I spent
on the DCOM / RPCSS security fire drill. I
spent days looking at some clever marshaling code, eventually satisfying myself that
it worked perfectly. Then someone else
wrote an automated attacker and discovered real flaws in just a few hours. Other
architects and senior developers scrutinized different sections of the code. Then
some researchers from MSR who are focused on automatic program validation ran their
latest tools over the same code and gave us step-by-step execution models that led
up to crashes. Towards the end of the
fire drill, a virtuous cycle was established. The
code reviewers noticed new categories of vulnerabilities. Then
the researchers tried to evolve their tools to detect those vulnerabilities. Aspects
of this process were very raw, so the tools sometimes produced a great deal of noise
in the form of false positives. But it
s clear that we were getting real value from Day One and the future potential here
is enormous.
One question that always comes up, when we talk about
adding significant value to Visual Studio through additional tools, is whether Microsoft
should give away these tools. It s a
contentious issue, and I find myself going backwards and forwards on it. One
school of thought says that we should give away tools to promote the platform and
improve all the programs in the Windows ecology. In
the case of tools that make our customers applications more secure or more resilient
to future changes in the platform, this is a compelling argument. Another
school of thought says that Visual Studio is a profit center like any other part of
the company, and it needs the freedom to charge what the market will bear.
Given that my job is building a platform, you might expect
me to favor giving away Visual Studio. But
I actually think the profit motive is a powerful mechanism for making our tools competitive. If
Visual Studio doesn t have P&L responsibility, their offering will deteriorate
over time. The best way to know whether
they ve done all they can to make the best tools possible, is to measure how much
their customers are willing to pay. I
want Borland to compete with Microsoft on building the best tools at the best price,
and I want to be able to measure the results of that competition through revenue and
market penetration.
In all this, I have avoided really talking about the
issues of versioning. Of course, versioning
and application compatibility are enormously intertwined. Applications
break for many reasons, but the typical reason is that one component is now binding
to a new version of another component. We
have a whole team of architects, gathered from around the company, who have been meeting
regularly for about a year to grapple with the problems of a complete managed versioning
story. Unlike managed AppCompat, the
intellectual investment in managed versioning has been enormous.
Anyway, Application Compatibility remains a relatively
contentious subject over here. There
s no question that it s a hugely important topic which will have a big impact on
the longevity of our platform. But we
are still trying to develop techniques for achieving compatibility that will be more
successful than what Windows has done in the past, without limiting our ability to
innovate on what is still a very young execution engine and set of frameworks. I
have deliberately avoided talking about what some of those techniques might be, in
part because our story remains incomplete.
Also, we won t realize how badly AppCompat will bite
us until we can see a lot of deployed applications that are breaking as we upgrade
the platform. At that point, it s easier
to justify throwing more resources at the problem. But
by then the genie is out of the bottle& the deployed applications will already
depend on brittle accidents of implementation, so recovery will be painfully breaking. In
a world where we are always under intense resource and schedule pressure, the needs
of AppCompat must be balanced against performance, security, developer productivity,
reliability, innovation and all the other must haves .
You know, I really do want to talk about Hosting. It
is a truly fascinating subject. I m
much more comfortable talking about non-preemptive fiber scheduling than I am talking
about uninteresting topics like implicit contracts and compatibility trends.
But Hosting is going to have to wait at least a few more
weeks.