1489 - PackageHash changes on Nuget.Server

Nov 2, 2011 at 10:16 PM

Hi there – long-time fan, first time contributor here J

I just submitted a pull request with a fix for this issue:

http://nuget.codeplex.com/workitem/1489

http://nuget.codeplex.com/SourceControl/network/Forks/stovellp/FixingBug1489HashThreadSafety/contribution/1619

As some background, my product has a page that makes multiple calls to NuGet.Server at the same time to get version information for a list of packages. If a file has been added to the server, or the server app pool is starting up, NuGet.Server attempts to calculate hashes for all of the files on disk. But the .NET framework class that it relies on to do the hashing is not thread safe – so when multiple requests are made during warm up, the hashes get corrupted. Since they are cached, the only solution is an IISRESET. I imagine this would also cause problems on a team if you had a custom NuGet.Server, and 10 developers on your team decided to open the NuGet VS extension and check for updates at the same time before the cache was warmed up.

I’ve had about a dozen customers report this problem to me but only tonight was I able to track it down. The workarounds for it aren’t great so it would be really cool if this change could be included in the upcoming (1.7?) release :)

Also, thanks for all the hard work - NuGet is causing some fundamental changes in the .NET world and that rocks!

Regards,

Paul Stovell

Deployment getting you down? Try Octopus, NuGet-based automated deployment for .NET

Web: octopusdeploy.com | Blog: paulstovell.com

Nov 2, 2011 at 10:46 PM

Hi Paul,

In your test, you use Parallel.ForEach to ‘prove’ that it works across multiple threads. The problem is that you haven’t actually proven that the test uses multiple threads. (Parallel.ForEach doesn’t always spin up extra threads.)

I generally add the highlighted parts to my thread safety tests:

[Test]

public void ShouldProcessOnMultipleThreads()

{

if (Environment.ProcessorCount == 1)

Assert.Inconclusive("This test requires multiple logical processors.");

// Arrange

var breaker = new CircuitBreaker();

var source = Enumerable.Range(0, 100).ToArray();

// Act

var usedThreadIds = new List<int>();

Reliable.ParallelForEach(

breaker,

source,

elements => usedThreadIds.Add(Thread.CurrentThread.ManagedThreadId),

new RetryOptions

{

AllowedRetries = 5,

RetryInterval = TimeSpan.Zero

},

new ParallelOptions

{

MaxDegreeOfParallelism = 100

});

// Assert

var distinctThreadIdCount = usedThreadIds.Distinct().Count();

Assert.IsTrue(distinctThreadIdCount > 1);

}

(from http://hg.tath.am/reliability-patterns/src/5822f47a34bb/Tests/ReliableParallelForEachTests.cs)

These additions ensure that the test did actually create the scenario you were expecting.

-- Tatham

From: stovellp [email removed]
Sent: Thursday, 3 November 2011 8:17 AM
To: Tatham Oddie
Subject: 1489 - PackageHash changes on Nuget.Server [nuget:278108]

From: stovellp

Hi there – long-time fan, first time contributor here J

I just submitted a pull request with a fix for this issue:

http://nuget.codeplex.com/workitem/1489

http://nuget.codeplex.com/SourceControl/network/Forks/stovellp/FixingBug1489HashThreadSafety/contribution/1619

As some background, my product has a page that makes multiple calls to NuGet.Server at the same time to get version information for a list of packages. If a file has been added to the server, or the server app pool is starting up, NuGet.Server attempts to calculate hashes for all of the files on disk. But the .NET framework class that it relies on to do the hashing is not thread safe – so when multiple requests are made during warm up, the hashes get corrupted. Since they are cached, the only solution is an IISRESET. I imagine this would also cause problems on a team if you had a custom NuGet.Server, and 10 developers on your team decided to open the NuGet VS extension and check for updates at the same time before the cache was warmed up.

I’ve had about a dozen customers report this problem to me but only tonight was I able to track it down. The workarounds for it aren’t great so it would be really cool if this change could be included in the upcoming (1.7?) release :)

Also, thanks for all the hard work - NuGet is causing some fundamental changes in the .NET world and that rocks!

Regards,

Paul Stovell

Deployment getting you down? Try Octopus, NuGet-based automated deployment for .NET

Web: octopusdeploy.com | Blog: paulstovell.com

Developer
Nov 3, 2011 at 12:11 AM

I'm not very good with multithreading so correct me if I'm wrong - but neither the way the test is currently setup nor the additional checks you highlight tell us if the call to SHA512.ComputeHash is stomped over by another thread which would possibly result in bad hashes being calculated. All it tells us is that there were multiple threads involved.

Nov 3, 2011 at 12:41 AM

It’s a brute force approach – runs lots of calls across lots of threads and make sure none of them break.

-- Tatham

From: pranavkm [email removed]
Sent: Thursday, 3 November 2011 10:12 AM
To: Tatham Oddie
Subject: Re: 1489 - PackageHash changes on Nuget.Server [nuget:278108]

From: pranavkm

I'm not very good with multithreading so correct me if I'm wrong - but neither the way the test is currently setup nor the additional checks you highlight tell us if the call to SHA512.ComputeHash is stomped over by another thread which would possibly result in bad hashes being calculated. All it tells us is that there were multiple threads involved.

Nov 3, 2011 at 1:04 AM

Hi Tatham,

I understand where you're going. However I'm not sure that asserting multiple threads were used is a good idea - Parallel.For (or ThreadPool.QueueUserWorkItem) could choose to use multiple threads, or it could not, and either is valid. In your test above, I suspect there's a small chance that one day, you'll run that test and it will fail, only to pass the next day, since it's perfectly valid for Parallel.For to not use multiple threads. In the original test there's an equal chance the test will run and not test anything, but at least it won't fail. Is a test that randomly fails better than a test that randomly passes without testing anything? I think there are arguments to be made for both sides. 

I guess a few trade off's come into it:

  • What are the chances multiple threads won't be used? I chose a large list of items just to improve the odds of getting multiple threads but agree it's not guaranteed. 
  • In 1 out of every 1000 runs, do you want a test that fails (only to succeed the next time you run the test), or a test that passes without testing anything (only to test everything the next time it runs)? 
  • How clear do you want the code to be?

My goal for the test was just to help ensure the bug doesn't come back. In this case I actually don't mind it if has a small chance of passing even when the bug is reintroduced, because I'm sure the NuGet team run their tests a few times a day, so it would eventually get picked up. On the other hand, I would have felt uncomfortable committing a test that had the potential to fail one in every 1000 test runs, leaving the maintainers to figure out why.   

I think the "correct" approach would be to new up actual Thread objects and use WaitHandles to check each iteration is done, to guarantee you're getting multiple underlying threads. But I think that would detract from the readability. So it's all a trade off. 

PS: I don't think the check for ProcessorCount == 1 is necessary - a single core/processor can still execute multiple threads simultaneously. 

Paul

Nov 3, 2011 at 1:20 AM

Is a test that randomly fails better than a test that randomly passes without testing anything?

I agreed it’s a trade-off, I just chose the other side of the fence. :)

I would have felt uncomfortable committing a test that had the potential to fail one in every 1000 test runs, leaving the maintainers to figure out why.

The approach I suggested can provide a clear message about the threading failure, just add a message to the Assert.IsTrue explaining that the test relies on multiple threads but only one was ever spawned.

PS: I don't think the check for ProcessorCount == 1 is necessary - a single core/processor can still execute multiple threads simultaneously.

I’m 110% aware that you can run multiple threads on a single code. My experience on virtualised build servers with a single logical core is that the Parallel.ForEach logic won’t result in multiple threads. I’m just short-circuiting the test early.

-- Tatham

From: stovellp [email removed]
Sent: Thursday, 3 November 2011 11:04 AM
To: Tatham Oddie
Subject: Re: 1489 - PackageHash changes on Nuget.Server [nuget:278108]

From: stovellp

Hi Tatham,

I understand where you're going. However I'm not sure that asserting multiple threads were used is a good idea - Parallel.For (or ThreadPool.QueueUserWorkItem) could choose to use multiple threads, or it could not, and either is valid. In your test above, I suspect there's a small chance that one day, you'll run that test and it will fail, only to pass the next day, since it's perfectly valid for Parallel.For to not use multiple threads. In the original test there's an equal chance the test will run and not test anything, but at least it won't fail. Is a test that randomly fails better than a test that randomly passes without testing anything? I think there are arguments to be made for both sides.

I guess a few trade off's come into it:

  • What are the chances multiple threads won't be used? I chose a large list of items just to improve the odds of getting multiple threads but agree it's not guaranteed.
  • In 1 out of every 1000 runs, do you want a test that fails (only to succeed the next time you run the test), or a test that passes without testing anything (only to test everything the next time it runs)?
  • How clear do you want the code to be?

My goal for the test was just to help ensure the bug doesn't come back. In this case I actually don't mind it if has a small chance of passing even when the bug is reintroduced, because I'm sure the NuGet team run their tests a few times a day, so it would eventually get picked up. On the other hand, I would have felt uncomfortable committing a test that had the potential to fail one in every 1000 test runs, leaving the maintainers to figure out why.

I think the "correct" approach would be to new up actual Thread objects and use WaitHandles to check each iteration is done, to guarantee you're getting multiple underlying threads. But I think that would detract from the readability. So it's all a trade off.

PS: I don't think the check for ProcessorCount == 1 is necessary - a single core/processor can still execute multiple threads simultaneously.

Paul

Nov 3, 2011 at 1:26 AM

Actually ... new idea.

How about:

 

[Fact]
public void CryptoHashProviderIsThreadSafe()
{
    // Arrange
    byte[] testBytes = Encoding.UTF8.GetBytes("There is no butter knife");
    string expectedHash = "xy/brd+/mxheBbyBL7i8Oyy62P2ZRteaIkfc4yA8ncH1MYkbDo+XwBcZsOBY2YeaOucrdLJj5odPvozD430w2g==";
    IHashProvider hashProvider = new CryptoHashProvider();

    var usedThreadIds = new List<int>();
    Parallel.For(0, 10000, ignored =>
    {
        // Act
        byte[] actualHash = hashProvider.CalculateHash(testBytes);

        // Assert
        Assert.Equal(actualHash, Convert.FromBase64String(expectedHash));
    });

    if (usedThreadIds.Distinct().Count() == 1)
        Assert.Inconclusive("This test relies on multiple threads, however only one was ever spawned.");
}

 

This results in:

  • Pass if hashes are consistently generated across multiple threads
  • Fail if the bug regresses
  • Inconclusive if the test didn't test anything

That seems like the correct set of outcomes to me.

-- Tatham

Nov 3, 2011 at 11:22 AM

Ah! Brilliant. I've been testing a similar fix - but I'd thread-safed the call rather than just the crypto (and I'd not figured out it was exactly that that wasn't thread safe). Please do push this out soon. :-) If you use the IDE to look at packages and have a big package or two in your repostiroy you'll run into this very easily with even one developer.

Nov 4, 2011 at 10:26 AM

Just looking over this - hash values are stored in a dictionary, and access to a dictionary is not protected by the underlying .NET platform - so it isn't just the hash calc you have to protect, but also the access to the dictionary has to be completly protected. Fortunately, that is localized to one class (http://msdn.microsoft.com/en-us/library/xfhwa508.aspx).

Developer
Nov 4, 2011 at 10:36 AM

Change it to use a ConcurrentDictionary and see if that helps.