Improvements for Nuget

Topics: General
Jul 12, 2013 at 7:36 PM
After some tweeting with @jeffhandley and @dotnetjunkey about nuget improvements they asked me to write them down in a forum post because it was too much to handle on twitter.

Which packages are installed as dependencies

Currently when you install a package, it will install all dependencies and register them in packages.config like you installed them directly. This is all wonderful until you encounter the following scenarios:
  • A developer installed a number of packages and then installs a package by accident. The only way for him to reliably back out the change is by reverting the packages.config. Uninstalling a package will not uninstall the dependencies and it is not easy to see which dependencies were installed by the packages (you have to inspect the package metadata in the client dialog/gallery).
  • A package used to rely for example on antlr, but a new version is updated to use another parser. When you update the package, the old package gets updated, but the antlr package remains installed, even when you don't need it anymore. (Maybe you still need it in your project, but there is no indication that it can potentially be removed.)
This is usually not a problem if you have little nuget packages installed, but when you have 30-40 packages installed it becomes a mess.

I propose to change the way package installation is registered in packages.config. Currently packages.config looks like this.
<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="EnterpriseLibrary.Data" version="6.0.1304.0" />
  <package id="EnterpriseLibrary.Common" version="6.0.1304.0" />
</packages>
I propose the new format to be:
<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="EnterpriseLibrary.Data" version="6.0.1304.0">
     <package id="EnterpriseLibrary.Common" version="6.0.1304.0" />
  </package>
</packages>
When the user then installs EnterpriseLibary.Common by hand the file changes to:
<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="EnterpriseLibrary.Data" version="6.0.1304.0">
     <package id="EnterpriseLibrary.Common" version="6.0.1304.0" />
  </package>
  <package id="EnterpriseLibrary.Common" version="6.0.1304.0" />
</packages>
This of course requires that the version number is in sync between the 2 elements about EnterpriseLibary.Common.

When the dependency has dependencies of itself, these can be added again under the other node. (I add unity here as example, EnterpriseLibrary.Common doesn't actually have any dependencies.)
<?xml version="1.0" encoding="utf-8"?>
<packages>
  <package id="EnterpriseLibrary.Data" version="6.0.1304.0">
     <package id="EnterpriseLibrary.Common" version="6.0.1304.0">
        <package id="Unity" version="6.0.1304.0" />
     </package>
  </package>
  <package id="EnterpriseLibrary.Common" version="6.0.1304.0">
     <package id="Unity" version="6.0.1304.0" />
  </package>
</packages>
I understand that this makes the content of the packages.config file more complicated, but it will make the system more robust.

Package Source tracking

There is nothing which prevents us from installing packages from multiple sources. And there is nothing in the packages which identify the source they are published from. This sometimes creates (big) problems.

In our case, we download the packages from the nuget.org package source, strong name the content and publish them on our source. When people mess up and the nuget.org package source is accidently before the internal package feed, package restore will download the package from the original source (if a source is not specified). Worse even, when there is a new version of the package on the official feed, it will show as an update in visual studio when the "All sources" node is selected and potentially mess up things even more.

Allow package creators to mark dependent packages as "Don't reference"

Currently when you create a package which depends on another package, the dependent package is installed as it was configured. This has an annoying side-effect that the dll is automatically referenced, even when you don't expose any type from the package you use.

Imagine, we created a "common" assembly which depends on SharpZipLib to enable zip support. We wrapped the calls in our methods so we are free to switch out to another zip library and we don't expose any types of SharpZipLib. But due to the referencing of the sharpziplib library the methods are available for use and also pollute intellisense.

If it were possible to specify "DontReferenceAssemblies" in the nuspec file, it would be possible to install the SharpZipLib package without getting the reference pollution. (I'm not sure if we would need a kind of "bin deployable" system to get the dll's copied to the build output folder.)

(To give an extreme example, we have a project which used to have 4 referenced dll's and after nuget conversion it has 69 referenced dll's)

This problem has already been raised in https://nuget.codeplex.com/discussions/436612

Advanced Nuget Cache settings for Build servers

currently the nuget cache folder location can be specified with an environment variable and the size is capped at 200 packages. This is probably enough for a development pc, but on a build server this is not sufficient (we have 215 custom packages at the moment, not counting different versions). Also, specifying the environment variable is not really consistent with other configuration options.

It should be possible to specify the following cache options:
  • Cache location in a machine level nuget.config file
  • Cache location in the nuget.config file for a project
  • Max size of all cache folders in a machine level nuget.config (to prevent flooding the server)
  • Max number of packages in a cache folder on machine level
  • Max number of packages in a cache folder on project level
This would allow to better deal with concurrency issues on the cache and clearing packages from the cache which should not be cleared.

Thread Safe Cache

When the local cache is full there are contention issues with clearing the cache. (package restore is failing because packages have disappeared from the cache. )
See https://nuget.codeplex.com/discussions/445184

Besides that, commands should not fail due to cache problems, they should give a strong warning and fetch the needed packages from the feed.

Source specific NuGet cache

Currently there is only one nuget cache folder, this is dangerous as it potentially triggers installation of incorrect dependencies.

Imaging the following scenario:

Package "X 1.0.0" depends on package "Y 1.0.0" on the private nuget feed. Earlier I worked a bit on a personal project and installed package "Y 1.0.4" from the nuget.org feed. When I now install package X 1.0.0 in my work project it will grab the package "Y 1.0.4" from the cache.

Currently this is not an issue for most people as they usually depend on the public nuget.org feed, but as services like myget will become more popular this kind of (hard to spot) mishaps will become a real showstopper.

(as a workaround we run the commands with -NoCache which is not great for performance)

Better support of a "common" package repository

It is possible to specify a custom package repository location (instead of the packages folder in the solution folder). This is well liked in the age of (tiny) ssd drives as it can considerably reduce required disk space. It is also very interesting for build servers as you don't have to store the same packages over and over for each solution. (Think a 300MB ui control library for example.)

The problem with this common package repository at the moment is:
  • Installing packages in this folder is not 100% thread safe. (we get build failure on and off)
  • Sometimes packages get corrupted in this shared folder. (I suspect because 2 builds try to write in the same folder.)
  • Package restore ignores the setting (to verify if this is still the case in the latest versions) (if it does not ignore it, it should still be checked if the hint paths get updated correctly.) --> we work around this by using symlinks.

Update does nothing when package restore has not run

When checking out a project and calling nuget update packages.config nothing happens if you didn't first call package restore. This does not make any sense

Improve configuration settings

Currently nuget configuration is a bit spread out all over the place and there is not much consistency. Some things can be put in user config, but not project config, other things need to be set as environment variables. This is annoying to manage from an IT perspective, and it is annoying to manage when making tooling which uses nuget config information. It is also hard to figure out which configuration is available when using nuget.core.dll. (Maybe a strongly typed class with the possible settings?)

Performance of update

When running nuget update packages.config when there are many dependencies in the packages.config which use the same dependencies. Imagine a package containing helper methods which is a dependency of virtually all the other packages. The update command will check over and over again if there is a new version available for that same dll. (We have a project with 69 unique packages installed which does 800+ update checks. at between 0.8 to 1.5 seconds per check this takes usually 15 minutes to complete.)
Developer
Jul 15, 2013 at 6:08 PM
Thanks for a great write-up. Lots of good suggestions here. I'll try to comment as much as I can.

Which packages are installed as dependencies

When you uninstall a package from the dialog, NuGet does ask you if you want to uninstall all dependencies too. I don't know if you have seen that dialog.
When you uninstall from the PS console, you're correct that the behavior is to NOT uninstall dependencies. That is intentional. We want NuGet to be always conservative and try not to uninstall stuff without your explicit action. If you do want to uninstall dependencies, you can pass the -RemoveDependencies flag to the Uninstall-Command.

Package Source tracking

This has come up several times for various scenarios. It's on our backlog. I imagine eventually we'll implement it. It's just a matter of prioritizing work.

Allow package creators to mark dependent packages as "Don't reference"

This is an interesting idea. To make sure I understand correctly, you don't want SharpZipLib to be added as reference to project, but you do want it to be copied to the output folder when project is built, correct? I have seen the same needs when a managed assembly needs to reference a native assembly, e.g. SqlServerCompact package. It's on our backlog too.

Implementation-wise, in order to accomplish this, we need to copy SharpZipLib to the same folder as the main assembly.

Thread Safe Cache

We're aware of this problem and we do have an internal fix for it.

Source specific NuGet cache

I'm not sure if this is a useful feature.

Better support of a "common" package repository

Can you file a bug on the threading issue? It's probably related to the threading issue of the Cache folder.

Update does nothing when package restore has not run

Good point. Can you file a work item for this?

Improve configuration settings

Can you clarify which settings can be set in project config, but not in user config? Regarding environment variables, they are often required on build machines where %AppData% folder is not available.

Performance of update

Yes, this is a known issue and I have plan to improve the overall performance of it. We could be a lot smarter in caching package update versions.
Jul 16, 2013 at 8:12 PM

Which packages are installed as dependencies

The remarks on the remove is a good point, but it does not really work well when a dependent package is also installed explicitly. This is information which is easily lost when the people maintaining the software are no longer the ones that originally wrote it. And once a package is gone (and your build is now failing) you have to scan the log files to find out which version of the secondary package you had installed.)

Also, finding out if a package is a dependency from another package is quite easy, the problem is that once you installed the package, there is no trace left if the package was installed explicitly, or if it was pulled in as a dependency. Internally we track this information so we have a clear view on who to notify when there are changes on the lower level packages.
(This is done by pulling out the direct references from the build output to determine which packages were pulled in as dependency from other packages, and which packages were used/referenced directly. I would prefer if we did not have to do this step.)

Package Source tracking

I am willing to implement this if we can agree on how to implement it. (I can write a proposal)

Allow package creators to mark dependent packages as "Don't reference"

Indeed, that is what we need.

Source specific NuGet cache

The issue is that I have 2 builds running on one build server. One from source X (code signed) and one from source Y (not code signed.) Currently it is hit or miss on which package you will end up with in your solution. (workaround -NoCache)

Better support of a "common" package repository

I will log a work item and will try to write out the scenario to reproduce this problem.

Update does nothing when package restore has not run

I will create a work item.

Improve configuration settings

As far as I know all settings in project level config can be set in appdata config.
Jul 17, 2013 at 7:27 AM
On "Which packages are installed as dependency":

It would be enough to have an attribute on those packages which were installed manually, there's no need to nest all of it's dependencies.

e.g.

<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="EnterpriseLibrary.Data" version="6.0.1304.0" manually_installed="true" />
<package id="EnterpriseLibrary.Common" version="6.0.1304.0" />
</packages>

BTW, one option would be to extend the meaning of "allowed_versions" : it should be explicitly set on for manually installed packages, as API changes on those are most likely to break our project. (conversely, API changes on automatically installed dependencies should not have an impact on our project as long as we don't use them directlly)

e.g.

<?xml version="1.0" encoding="utf-8"?>
<packages>
<package id="EnterpriseLibrary.Data" version="6.0.1304.0" allowed_versions="[6,7)" />
<package id="EnterpriseLibrary.Common" version="6.0.1304.0" />
</packages>
Developer
Jul 18, 2013 at 5:05 PM

Which packages are installed as dependencies

I agree that it's useful to know if a package is installed directly or as a result of dependency of another package, for the purpose of keeping track of packages over the long term. I just think that the current behavior of uninstall-package command, i.e. not uninstalling dependencies by default, is a good choice.

I also agree with broggeri that we don't need to nest all of dependencies. That would become very cumbersome and error prone.

Here's a question for my better understanding:

What are specific scenarios/commands that you think will benefit users if we store that info into packages.config as broggeri suggests?
Coordinator
Jul 22, 2013 at 4:42 AM
This is all really good. I appreciate you thinking this all through and submitting all of these ideas. Some of them were already on our mind, and some not. It's great to understand your scenarios better so that we can really understand the problems these ideas will address.

Thanks,
Jeff
Jul 22, 2013 at 8:13 AM
dotnetjunky wrote:
What are specific scenarios/commands that you think will benefit users if we store that info into packages.config as broggeri suggests?
  • "Uninstall-Package -RemoveDependencies" should not remove the dependencies which are tagged as "manually_installed"
  • "nuget pack project.csproj" should include "manually_installed" packages in the declared dependencies.
Jul 25, 2013 at 6:14 PM
For project dependencies:

I would love to see a dependency chain of all my packages.

Step 1: I would like to be able to click on a package and see what packages depend on it in my project. So if I click on jQuery, I would see that jQueryUI and jQuery Validation depend on it. This would prevent developers from trying to uninstall packages that they think aren't being used, but really are.

Step 2: I would like to see all the projects that don't have any dependencies on them. These would hint to the developer which ones may or may not be being used. So let's say the list included Package X. Then I could check to see if Package X was actually being used and if not, remove it, which would potentially show me other packages that weren't being used.

Step 3. I would like this info output to a text document (or possibly in the packages.config) so that developers can keep track of which packages are leaf nodes. This way if a new package ends up as a leaf, possibly due to Package A replacing a dependency on Package B with Package C, the developers would see that there is a package that needs to be investigated to see if it is still needed.

Just some thoughts,
Dan
Developer
Jul 25, 2013 at 6:41 PM
@MisinformedDNA Have you tried the package visualizer? http://docs.nuget.org/docs/workflows/package-visualizer
Jul 25, 2013 at 7:55 PM
Yea, that looks pretty good. I only have Professional, so I guess I'm out of luck. Good to know in the future.
Jul 26, 2013 at 8:00 PM
(Sorry for the late reply, I have been on vacation)

I suggested storing the dependencies in the packages config because it makes it easy to see if a dependency can be removed or not when using the -removedependencies option. (if it is installed manually, don't remove, if it is used in another package, don't remove.)

Storing this information in the packages.config has a additional advantage for me us as it allows tracking dependencies between projects without having to run package restore. I can just index all the package.config files and keep the full dependency graph of our projects. (we have 1200+ projects, so a solution with all of them just does not work.)

Additional suggestions:

Store related .csproj in the packages.config

Currently there is no relation between project file and the packages.config file except that they are stored in the same folder. When you have multiple project files in the same folder (we have this for multi-platform compilation and some test projects) you have no idea for which projects the packages are defined. It would also fix the problem that the projects where you did not install the packages are messed up and are very hard to fix.

Index package contents in nuget gallery + find package for dll feature.

Currently it is very easy to start a new project with NuGet. But converting existing projects is pretty hard. The only thing you have in your existing project is the dll and the version. But there is no way to find out which Nuget package would match your dll. Many packages are named such that the name has no relation with the dll name. It would be very cool if we could right click on a reference in visual studio and "Find" the package (+version) where this dll is available.
(Currently we indexed all the nuspec files and search in this information to find package that matches the dll. )
Jul 26, 2013 at 8:05 PM
I created ticket https://nuget.codeplex.com/workitem/3537 to solve "Update does nothing when package restore has not run"
Developer
Jul 26, 2013 at 10:19 PM

Store related .csproj in the packages.config

The current thinking is to rename packages.config to packages.<project name>.config. That would solve the problem with multiple projects in the same folder, but at the same time avoid adding extra metadata to packages.config. NuGet can be used by other clients besides Visual Studio, so we want to avoid tying the format of packages.config to VS.

Index package contents in nuget gallery + find package for dll feature.

I believe the gallery crew has discussed about this in the past. I don't know when we will do it, but it's on our agenda.
Jul 29, 2013 at 12:46 PM
Another one popped up today

Version number should be optional in nuspec file

The version element in the .nuspec file is a required element although it can be specified on the nuget pack command line. Since we always populate the version from the command line (on the build server) it does not make sense to have a completely out of date number in the nuspec file. Also, if we would fill out a "dummy" version number, when the package is built with nuget pack without a version number, it will be created incorrectly.

I created the ticket https://nuget.codeplex.com/workitem/3541 for this and detailed the scenarios.
Jul 29, 2013 at 7:15 PM
With regards to Package Source tracking -> have you tried enforcing your developers to only make use of your own feed? Doing that makes it easier to track where a package comes from. One could use their own NuGet server or something like http://docs.myget.org/docs/how-to/make-myget-list-and-automatically-mirror-packages-from-other-feeds#Displaying_search_results_from_NuGet.org_or_other_package_sources_on_your_feed
Jul 29, 2013 at 8:15 PM
We enforce the usage of our own feed, but people sometimes mess things up.

We actually have 2 feeds in place. One where the developers can push to themselves, (mainly to share in progress packages) and one where only the build server can push to. (With correctly strong named and code signed packages.)

The issue is that when things work fine on their pc, they fail on the build server and the errors are rather vague.

The main reason to host our own feed is to ensure all packages are correctly strong named and to verify that the content of the packages is "trustworthy".
Jul 29, 2013 at 8:46 PM
Another potential improvement

Fine grained control of nuget update -safe

Currently nuget update -safe updates to latest revision of a package (well, actually latest prerelease of the next minor when -pre is specified). It would help developers to get the right version of a component if they could specify on what level they want to update.

See https://nuget.codeplex.com/workitem/3542 for details.