NuGet handling of transitive dependencies

Topics: General
Sep 30, 2013 at 1:46 PM
Sorry if this is not constructive, but I'm coming to NuGet as a previous Maven user.

When I add a NuGet reference to my project, all of that project's dependencies are also added to my references.

Would it be possible for transitive dependencies, which are only needed at runtime, to not be added as direct references to my project?

The way NuGet currently works makes it needless hard to do dependency analysis and creates issues with build chains getting really long as my project doesn't need to be rebuilt if a transitive dependency changes.
Oct 9, 2013 at 8:45 PM
I too am fighting this same issue and have been trying to figure out how to address it.

Basically, it would be very beneficial to know that Package A, has a direct reference to package B and E but not to packages C, D and F because those packages were simply picked up as dependencies of B and E.

A simple change to the packages.config file would handle this.

From the previous example, the packages.config for A would look something like this:
<packages>
   <package id="B" version="1.1.0" targetFramework="net40">
      <packageDependencies>
         <package id="C" version="1.2.1" targetFramework="net40" />
         <package id="D" version="1.2.1" targetFramework="net40" />
         <package id="E" version="1.2.1" targetFramework="net40">
            <packageDependencies>
               <package id="F" version="1.3.5" targetFramework="net40" />
            </packageDependencies>
         </package>
      </packageDependencies>
   </package>
   <package id="E" version="1.2.1" targetFramework="net40">
      <packageDependencies>
         <package id="F" version="1.3.5" targetFramework="net40" />
      </packageDependencies>
   </package>
</packages>
The format is a little verbose but it plainly shows which packages are direct dependencies and which packages are transitive dependencies and allows for a package to be both at the same time.

This also allows for NuGet to perform its normal "minimum required version" semantics by comparing versions of both direct and transitive dependencies in order to determine which version to actually reference within the project.

It would be even nicer still if the project only contained references to direct dependencies and transitive dependencies were simply copied from their at-rest location (wherever NuGet is configured to store installed / downloaded packages) to the output folder as part of a post-build step or a project target include file.

Either solution will accomplish the same goal and would allow for tooling to be written to figure out complex dependency chains.
Oct 10, 2013 at 3:12 PM
I think I understand what you want although it's for different reasons. I'd like to better understand the reasons and importance of this.
The way NuGet currently works makes it needless hard to do dependency analysis and creates issues with build chains getting really long as my project doesn't need to be rebuilt if a transitive dependency changes.
@Tanhe Could you explain more about how the issues with build chains manifest itself? Are you spending a lot of time updating one dependent package at a time manually? Are you using some continuous integration process where you pick up a new package version on a regular basis? Or is this 'depend on local package which is directly output by another project of mine'? Is it causing unnecessary rebuilds once a day? Many times a day?
This also allows for NuGet to perform its normal "minimum required version" semantics
@MarkAD88 Are you concerned about some specific scenarios you encountered where NuGet did not compute the minimum required version correctly? What was the impact?
Oct 10, 2013 at 3:13 PM
[I also have one thought about the design so far. Packages.config is currently designed for really easy authoring and editing, and it might be nice to keep that feature by splitting it into two files. Keep one where you can specify all your direct dependencies by hand, that is the 'input' to the NuGet process. This is 'packages.config'. The other file would be where NuGet caches its computed dependency tree with all the transitive dependencies. This is called 'packages.dependencies' or something. The change would be that transitive dependencies now only appear in the second packages.dependencies' file.]
Oct 10, 2013 at 3:51 PM
@tilovell

design suggestion:
  • make package.config contains only the direct dependencies and their version constraints, but not the actual version installed.
  • make packages.dependencies contains all the dependencies (direct and transitive), and their actual version
That way, package.config is purely "authoring" data
and packages.dependencies is purely the result of "resolving" the package.config against a package repository (which may evolve through time).
Oct 10, 2013 at 8:50 PM
@tilovell

As far as the build chains, we're using continuous integration and while we have MANY projects, they all move forward together (ie snapshot dependencies among each other).

The way that NuGet lists transitive dependencies with the primary dependencies, and perhaps this is just a limitation of the CI system, can be exibited with the following example.
  • ProjA
    • ProjB
      *ProjC
      *ProjD
If ProjC is changed and therefore rebuilt, it will trigger both a build of ProjB and ProjA, the build of ProjB then triggering another build of ProjA. ProjA gets built 2x and this is only for a tree two levels deep. We have situations where the dependency graph gets much deeper.

This can probably be overcome by delays, and collapsing the build chains within our CI system.

To me personally, the more critical issue arises from both making it too easy for developers to call into transitively dependent libraries (these are sometimes implementation details of intermediate libraries), as well as how to do analysis of how our code is actually interrelated as all dependencies appear as top-level.
Oct 10, 2013 at 9:29 PM
@Tanhe

Thanks for giving that detailed answer. To me it does sound like a large part of your particular problem with CI is having a CI or build system that unlike e.g. VS doesn't understand there are dependencies between the projects it is building and therefore doesn't know to build them in the correct order: C, B, A.

The other problem about unnecessarily calling into transitively dependent libraries puzzles me. I'm sure if it's a real problem or imagined - or a technical one versus a people problem.

If it's something in the public surface area of the libraries, then why isn't it legitimate to take a dependency on the library and call them? (Unless the purpose of the intermediate dependency B is to wrap the underlying library C, in which case, it shouldn't be exposing objects from C which people will be tempted to play with. Or unless it's people not understanding the concept of wrapping/abstraction in which case they might need a lesson?)

Regarding analyzing actual dependencies, I'm not sure what the situation is but I wonder if maybe reflection might be a good way to do this, instead of trying to analyze the inputs?
Tim

@broggeri I think a negative impact of putting a version range for primary dependencies in your authoring format would be that it is ambiguous - and therefore implies a dependency on the nuget gallery plus a computation to figure out what exact version to download. You need to query nuget.org to see what versions are available. And overall that might enhance your chance of broken builds.

Additionally I can see this argument applied to the packages.dependencies file as well, should it
a) contain version ranges or
b) not yet exist! (because you only checked in packages.config)
I would therefore lean towards both it mentioning specific versions (not just version ranges), and also existing by virtue of being checked in to source control alongside packages.config instead, so that it isn't recomputed unnecessarily, it would only need to be updated when you do a package operation like install-package.

At which point it's tempting to say it should maybe just be packages.config after all, but the dependencies in there have an extra field explaining why they were added, like what dependencies are they satisfying? And note that it's a dependency graph not a tree at least if we're talking of .net assemblies...

One other question I don't think we clear on the answer to. Would nuGet's behavior improve at all if it had this extra information about which dependencies are indirect?
Oct 11, 2013 at 7:24 AM
@tilovell: you're assuming that in the design I suggest, packages.dependencies are re-computed at each build. That is not what I meant.

packages.dependencies is still to be checked in SCM, and reused at each build as is. Only when the direct dependencies change (hence only when packages.config change) should one recompute (or adapt) packages.dependencies.
Oct 11, 2013 at 6:41 PM
@broggeri Indeed I jumped to conclusions. However, I'm still a little unclear as to what benefit is provided by having a version range in packages.config as opposed to a particular version... at first glance this just seems to make authoring more complicated?
Oct 13, 2013 at 12:07 AM
This also allows for NuGet to perform its normal "minimum required version" semantics
@MarkAD88 Are you concerned about some specific scenarios you encountered where NuGet did not compute the minimum required version correctly? What was the impact?
@tilovell I have not yet encountered an issue where NuGet did not calculate the minimum required version correctly. I was simply stating that in the event that the packages.config file was modified in the way I had suggested that that the direct and transitive dependency versions could be used in order to perform proper minimum version calculations still without issue.
Oct 14, 2013 at 9:34 AM
@tilovell: The code in the project may depend on some feature that is only provided starting from version X, or that has been deprecated in version Y. Specifying the range in packages.config will let nuget generate a packages.dependencies which will accomodate those constraints.

However, often the version range can be as simple as the minimum version required, so authoring doesn't have to be more complicated because of it.
Oct 14, 2013 at 1:49 PM
@tilovell: Another benefit is that, if the project is also to be packaged as a nuget package, the dependencies of the nupkg can be directly extracted from packages.config
Oct 15, 2013 at 4:33 PM
Edited Oct 15, 2013 at 4:33 PM
@MarkAD88 Thanks for clarifying.

OK trying to summarize the discussion so far:
-It would be nice if nuget somehow (e.g. packages.config) made it possible to find out which packages are directly referenced vs indirectly referenced.
-Allowing specifying version ranges in packages.config might be useful when the output of your project is a nuget package.

I think generally the nuget team will agree with the niceness aspect of the first and second points. One thing we would need to figure out in trying to address it will be whether changing the format of packages.config might break or cause unexpected behavior for older client versions.

I played around with Nuget 2.7.40808 a little and it seems to be accepting of valid XML permutations like adding child elements, adding extra attributes it doesn't know about. As in ignoring them properly and understanding your package configuration. But, ignoring them may not work with some designs where, if you are using an old nuget client with a new packages.config format it may not understand certain dependencies direct/indirect exist.

The simplest thing I can think of which addresses the first point in a probably backwards compatible way is adding a direct="false" attribute to all indirect dependencies in packages.config. I can't yet think of anything to address the second point of dependency ranges in a backwards compatible way. Caveat I'm not sure how any of this plays in with the team's overall plan right now - maybe there are some compelling reasons to break backwards compatibility I'm not aware of yet...?
Oct 15, 2013 at 5:38 PM
@tilovell: Just to point out:
  • There already exists a "allowedVersions" attribute which can be used in packages.config to specify version ranges
  • for projects which produces a nuget package, there is some duplication at the moment: the version ranges can/need to be specified both in packages.config, and in the nuspec