Nuget Performance Scaling

Jan 11, 2012 at 6:25 PM

We have been using a private Nuget feed in our CI environment since about September. Lately, the service has gotten much, much slower. It use to take a few minutes to install/update packages (we don't check them into source control) using the TeamCity plugin. Now it can sometimes take 40min to update packages, but typically takes 20min on our larger projects.

Here are some stats for our environment:

Total pacakges: about 5500

Unique Packages: about 40

Developers: about 40

Hardware: Physical machine, Windows Server 2008 R2, dual quad core xenon procs (16 logical cores), 32GB RAM. Packages and server are on a SSD.

This is massively slowing down our CI. Project that use to take minutes to build, now can take 20 min. Developers fear getting latest packages because it is painfully slow. When I look at the Nuget server, it can be consuming nearly 60% of the CPU in the heavy times (probably around 20 concurrent users - most of that is from our CI, not developers). On our larger solutions, we may have 30 projects and each of those projects may possibly be referencing 15 NuGet packages.

I have done some performance monitoring on my machine to see what is happening. An update call (i.e. nuget update <solution>) for one of our larger projects calls into GetPackages on the web server over 450 times. Each one of these calls seems to touch all 5500 of our packages.

Here is a screenshot of the summary from profiling the Update on my desktop machine:

http://imageshack.us/photo/my-images/851/nugetprofile.png/

Top of the function list is NuGet.LocalPackageRepository.GetPackage, System.IO.Packaging.Package.Open, SyStem.IO.Directory.Exists, and System.IO.FileSystemInfo.get_LastWriteTImeUtc.

On installs, a huge amount of time is spent calculating the hash. We will probably be forking the source to make hashing optional. In our use case, the overhead with calculating the hash is not really justifiable since everything is going across our corporate network and we are not too concerned about the integrity of that communication. Removing the hash calculation greatly sped up the install process.

I was wondering if anyone has any thoughts on performance of a private feed. I created the feed from the 1.6.1 tag in the repository. We use the command line to install/update packages and that is the latest publicly available version (1.6.21205.9031).

Is there anyway to more actively cache the file system info? Is this already implemented somehow and I just need to turn it on? Is there a way to minimize the service chatter for an update?

Developer
Jan 11, 2012 at 6:48 PM

To address your concerns, I have a changeset that drops the hash calculation (and also allowing for cheaper non-cryptographic hash algorithms).

However the fundamental problem is that our caching is short lived and in memory and cracking open a package is expensive. Have you considered using the gallery instead of NuGet.Server? Our long term plan is to make it at par with the ease of publishing that NuGet.Server offers right now (drop to a directory publishing), but since the feed itself is backed by a database, it's definitely much quicker and doesn't have some of the pain points NuGet.Server has.

Jan 11, 2012 at 6:58 PM

I'll try installing the gallery again. The first time I tried four months ago it was a nightmare and I didn't get it to work. Setting up the simple server was child's play comparitively.

Developer
Jan 11, 2012 at 7:05 PM

We migrated our Gallery codebase recently and it's much simpler to set it up. It's hosted on github at https://github.com/NuGet/NuGetGallery/. A bit ironic that we don't have a package for it as yet, but that's entirely my fault.

Jan 25, 2012 at 2:24 PM

Thanks for your help on this. We successfully got Nuget Gallery up from the GitHub code. Performance is much improved and much more scalable.

When we set it up using the include script, it is helpful to know that the script requires SQL Express installed on the machine you are running the script on regardless of what you put in for the web.config connection string. After it built and got it running in IIS, we changed the connection string to a separate SQL Server 2008 R2 server and it automatically created all the necessary tables for us.

Also, we couldn't get email to work, so I disabled the email confirm by manipulating the settings table in the database. I didn't dig into it because we don't need it or use, but it seemed to have something to do with security certificates and SMTP.