Can you turn off the MachineCache (%LOCALAPPDATA%\NuGet\Cache)?

Sep 14, 2011 at 11:12 AM

Quick question on the MachineCache.  Tracing the code through, it appears that the DataServicePackage caches itself to the local machine cache (%LOCALAPPDATA%\NuGet\Cache).  I was looking to try and optionally turn the caching off, and it appears that I would have to make changes all the way down to the DataServicePackage level itself to do this.

Unsure as to whether I am missing something...is there any easy way to do this?

Developer
Sep 14, 2011 at 11:20 AM

There's no way to turn it off. Any reason why you want to?

Sep 14, 2011 at 11:32 AM

If you have:

  1. Two feeds with the same package IDs
  2. One has higher versions than the other
  3. You specify the source via nuget.exe -s

NuGet.exe will resolve from the cache rather than the remote service, and generally the higher version will be chosen regardless of the package source you specify.

Perhaps another approach would be to have the cache use a subdirectory based on source URL?

Developer
Sep 14, 2011 at 12:00 PM

We can fix that so that it makes sure it prefers the one specified via -s rather than having a per source cache (which I don't like).

Sep 14, 2011 at 12:14 PM

Ideally when you specify -s with a single source it would not return a package from the local cache from a different source.  When you say "prefers" it reads that there could still be instances where a locally cached package with the same ID and similar/same version from a different source would still be used...is that what you meant?

Also, are you happy with this as just a discussion, or would it be better as an issue?

Developer
Sep 14, 2011 at 1:18 PM

Yes, the cache is still used but it would update the cache (for the same id and version) if the hash is different (that's how the impl of the dataservice package repository works). What's you're seeing is a result of the command line itself using the MachineCache as a separate source (which is first in the list). 

I think we can just copy this discussion to a bug now.

Sep 14, 2011 at 1:49 PM

Sorry, not sure I follow you.  I made changes to all the AggregateRepository instantiations to honour the useMachineCache boolean, so that there are no additional MachineCache repositories added when making a NuGet.exe "install" call (using an added -nocache flag on command line).  So there is only one MachineCache that I could find being instantiated, and it is in the DataServicePackage constructor.

Looking at the code snippets below, does the sourceRepository.FindPackage on a DataServicePackageRepository result in a DataServicePackage being cached?  If so, it seems that (in the absence of an explicit ConstraintProvider) you will always get a later version of the same package id when it is already cached locally from a different source?  Would this be the case regardless of id/version/hash?  I am probably missing something here.

 

NuGet.PackageHelper : ResolvePackage

  if (package == null) {

                // Try to find it in the source (regardless of version)

                // We use resolve package here since we want to take any constaints into account

                package = sourceRepository.FindPackage(packageId, version, constraintProvider);

 

                // If we already have this package installed, use the local copy so we don't 

                // end up using the one from the source repository

                if (package != null) {

                    package = localRepository.FindPackage(package.Id, package.Version) ?? package;

                }

            }

 

NuGet.PackageRepositoryExtensions :

        public static IPackage FindPackage(this IPackageRepository repository, string packageId, Version version) {

            return FindPackage(repository, packageId, version, constraintProvider: null);

        }

Developer
Sep 14, 2011 at 5:46 PM

This code you pasted doesn't come into play when talking about the machine cache (unless the aggregate repository includes the machine cache but that's orthogonal). The only relevant code is in the DataServicePackageRepository itself. But are we trying to fix the bug here? Are you planning to submit a pull request?

Sep 15, 2011 at 12:38 AM

DataServicePackageRepository, when it creates a DataServicePackage, will cause the DataServicePackage to cache itself locally, right?  So what I was asking is whether the DataServicePackageRepository, when instantiated as a sourceRepository, and asked to "FindPackage", would cause the package to be cached.  Are you saying that this is not the case?

Ignoring the fact that the DataServicePackage knows of the MachineCache directly and caches itself, I guess the bug we are seeing is that the source of a package is not respected when the same ID is found in the MachineCache.  Is this your interpretation of the issue?

Developer
Sep 15, 2011 at 2:35 AM

Yes, that exactly right. Finding a package doesn't cause it to get cached, downloading it does (accessing files etc). The issue you're seeing is the fact that we use the machine cache as the first source in the aggregate repository when restoring packages.

Sep 29, 2011 at 12:54 PM

Sorry, just getting back to this.  So I think the issue is still source-ignorant caching.  Although checking the remote source first may force the download of the package due to the hash being different, this just becomes a race for who gets there first.  It fixes the problem in a very single-threaded, duplicative manner.

1) Ask for package A v1.0 from SourceA.  Gets downloaded and installed in cache.

2) Ask for package A v1.0 from SourceB.  Checks remote, compares to cache, fails hash check, overwrites cache and installs.

3) Ask for package A v1.0 from SourceA again, rinse and repeat compare/fail/download cycle.

With proposed change to allow environment variable override to MachineCache location (pull request submitted), this can be avoided by setting the MachineCache location prior to each NuGet.exe run, but this is pretty ugly.  I think it may be hard to fix this in a more elegant way without restructuring the MachineCache a lot more.  If you are happy with it I will raise an issue as stated above?

Developer
Sep 29, 2011 at 1:06 PM
Edited Sep 29, 2011 at 1:08 PM

Let me ask this, why is it that you have so many versions of a package with the same id and version that are seemingly different? This assumption we have is the reason we do source-ignorant caching.

Sep 29, 2011 at 1:20 PM

Large organisation.  Large codebase.  Branching structure that dictates a Development and a Release branch, with "promotion" of code between them based on code quality acceptance criteria.  Requirement to maintain binary packages on both Release and Development, but essentially they reflect the same code and the same prospective release version and are marked as such, with versioning including TFS revision numbers and build numbers differentiating patch/build.

Currently, branching strategy cannot be changed.  Regardless of our situation, this seems to be a valid issue around a valid use case, albeit remediated slightly if the caching is fixed to respect remote sources first.  How about I just raise a bug?

Developer
Sep 29, 2011 at 1:24 PM

So multiple people build different packages with the same id and version (is that the summary?)

Feel free to file a bug. you're asking for a different feature entirely (that I haven't seen many ask for in general). 

Sep 29, 2011 at 1:35 PM

Filed a bug.  Not asking for a feature, hope I clarified it enough in the bug report.  Basically, without better security around the packages and no source specific caching, NuGet just grabs whatever matches (or exceeds) the Id and Version from the cache, however it got there and wherever it came from.  Regardless of who creates a package with the same Id and Version, I would like to know that where I explicitly tell NuGet to get a package from, it actually does.  

The more functionality above and beyond plain old "dll copy" you add to NuGet, the more I see security issues like this will become an issue.

Developer
Sep 29, 2011 at 5:07 PM

As Fowler said, it sounds like a bug that we are picking up items from the cache instead of pulling it from the feed. We need to change the way we are listing the machine cache in our aggregate repository so that we don't treat it as a source for packages.

Developer
Sep 29, 2011 at 5:33 PM

I think the fundamental problem us that you want package id and version to be per source while NuGet as a whole (not just the machine cache), takes package id and version to be the universally unique hash code of a package regardless of the source.

This also comes in when using the All node in the package manager dialog. We merge packages from sources and you really don't know where a package comes from. We decided early on in nuget to treat packages with the same id and version as the same (even though sometimes it's not the case), and I think you're wanting us to change that behavior, correct?

Coordinator
Sep 29, 2011 at 5:37 PM

It does sound like the request is to change that behavior in this particular case. In general, I think the behavior is correct. But when you specify a –Source, we should consider treating that as a further discriminator.

Developer
Sep 29, 2011 at 5:39 PM
Edited Sep 29, 2011 at 5:47 PM

That would mean that we persist that information somewhere.

Coordinator
Sep 29, 2011 at 5:41 PM

Well, we could have –Source also imply no-cache.

Sep 29, 2011 at 11:53 PM
Haacked wrote:

It does sound like the request is to change that behavior in this particular case. In general, I think the behavior is correct. But when you specify a –Source, we should consider treating that as a further discriminator.

 

+1 for this suggestion, although I think it would have some far-reaching impact...

To use NuGet to manage internal dependencies in a rapidly changing build process, the only way is to host your own gallery server.  In an environment where the version is effectively transient (i.e. where you use a floating version - versionless packages folders, -latest command line flag etc. as per other discussions...) I think it makes sense to identify a package by id, version AND source regardless.  You need to know where the package came from when there are possibly multiple sources.  There's an assumption in the current design that the nuget.org feed is the only one being used, which is not the case :)

 

Oct 1, 2011 at 5:35 AM

Look, if this was an explicit design decision around this, fair enough.  I don't think I am explicitly stating that the uniqueness should be based on id/version/source, just that if I explicitly state that an id/version must come from a particular source, then that is where it should come from.  So yes, there are two issues....first is that the cache is used without recourse to the explicitly stated source in current implementation, and secondarily , when this first issue is fixed, due to the hash/id/version uniqueness you can get a download/overwrite cycle when specifying different sources with same id/version (which sounds like a design decision and effects performance more than anything).

Just wondering if a better pattern might be:

1) Per source subdirectory (or equivalent partitioning) per source
2) MachineCache aggregates these if no explicit source request (aggregation may cause issues where you have duplicates, but this could be explicitly handled).
3) If source is stated explicitly, use only that particular source cache subdirectory

Obviously this would be hard to implement with the existing design of the MachineCache/DataServicePackage, as there is no elegant external way to inject knowledge of source.  I don't believe even the DataServicePackage has knowledge of this?  Just a thought...

Oct 26, 2011 at 1:08 PM

Just wondering if someone with commit rights will get a chance to look at this: http://nuget.codeplex.com/SourceControl/network/Forks/BenPhegan/MachineCacheOverride/contribution/1581 before 1.6 goes out?  We would love to see this in so we can drop the use of some forked code....