Resolved LINQ to get list of objects that contain an exact list in a complex structure

ColinMay80

Member
Joined
Sep 27, 2021
Messages
10
Programming Experience
3-5
This is a silly example but hope you get the idea.

Say I have a list of Persons, that contains a list of jeans which contains a list of features. In class, Feature, we have a Name property that is a FeatureType.

How do I check (and return the filtered list) and for example I have a list of 5 persons each having multiple pairs of jeans and 3 persons have jeans that match the criteria of having jeans that have a feature name of TORN and LOOSE?

C#:
class Person
{
   List<Jean> Jeans
}

class Jean
{
     List<Feature> Features
}

class Feature
{
    string ProductCode
    FeatureType Name
}

public enum FeatureType 
{
    TORN,
    SKINNY,
    LOOSE,
    TAPERED
}
 
Last edited:
In the future, please post your code as text in code tags. Use the toolbar icon that looks like </>.

Anyway, regarding your question, what's wrong with just a basic set of nest loops and adding those items that satisfy the conditions into a list. Or are you required to using LINQ? Who made that requirement?
 
Apologies, I will remember to post code in tags in the future.

Regarding using LINQ, I want to use it because nested loops are nasty in terms of performance and readability.
 
nested loops are nasty in terms of performance

Not necessarily true, and in any case LINQ is just a hidden set of loops (look at the source code of Enumerable.cs) and potentially a boatload of enumerators and other ops which may impact performance more.. One has to be quite careful to appreciate how it's implemented under the hood; do not assume that a LINQ solution outperforms a straight loops one, as it's usually slower

Readability is a "beauty is in the eye of the beholder thing"

C#:
var want = new FeatureType[]  { FeatureType.TORN, FeatureType.LOOSE };

var ripped = people.Where(p => p.Jeans.Any(j => j.Features.Any(f => want.Contains(f.Name))));


Your code is slightly wonky; I assume your enum was supposed to be called FeatureType. Do not use plurals in enum names, except for Flags enums. Do not use ALL CAPS on enum members
 
Last edited:
Thanks, yes enum was meant to be called FeatureType. Agree that better performance isn't always true but your 2 lines of code is way better than nested loops :)

The enum list is everything I want so in the above example with FeatureType.TORN and FeatureType.RIPPED I want tornAndRipped and not just ripped. Do I not need an All instead of Any somewhere?
 
Last edited:
Agree that better performance isn't always true but your 2 lines of code is way better than nested loops :)
Don't mistake brevity of code for better performance. As already stated, LINQ queries simply hide the complexity from you, rather than eliminating it. Those nested Any calls are not going to be implemented by magic. They will effectively use the nested loops that you claim you want to avoid. In general, LINQ code will perform worse than the well-written alternative. That's largely because one has to be written to handle various cases whereas you will always write code for your specific case. The difference will often be imperceptible but it will be there. Using LINQ for performance reasons just means that you don't understand LINQ. This is basically all that code does:
C#:
var ripped = new List<Person>();

foreach (var p in people)
{
    var include = false;

    foreach (var j in p.Jeans)
    {
        foreach (var f in j.Features)
        {
            if (want.Contains(f.Name))
            {
                include = true;

                break;
            }

            if (include)
            {
                break;
            }
        }

        if (include)
        {
            break;
        }
    }

    if (include)
    {
        ripped.Add(p);
    }
}
It's not exactly the same but pretty close. Obviously the LINQ query is easier on the eye but, while I haven't tested, I would expect it to perform better. Note that you'd need to include a call to ToList in the original code to make them equivalent.
 
Don't mistake brevity of code for better performance. As already stated, LINQ queries simply hide the complexity from you, rather than eliminating it. Those nested Any calls are not going to be implemented by magic. They will effectively use the nested loops that you claim you want to avoid. In general, LINQ code will perform worse than the well-written alternative. That's largely because one has to be written to handle various cases whereas you will always write code for your specific case. The difference will often be imperceptible but it will be there. Using LINQ for performance reasons just means that you don't understand LINQ. This is basically all that code does:
C#:
var ripped = new List<Person>();

foreach (var p in people)
{
    var include = false;

    foreach (var j in p.Jeans)
    {
        foreach (var f in j.Features)
        {
            if (want.Contains(f.Name))
            {
                include = true;

                break;
            }

            if (include)
            {
                break;
            }
        }

        if (include)
        {
            break;
        }
    }

    if (include)
    {
        ripped.Add(p);
    }
}
It's not exactly the same but pretty close. Obviously the LINQ query is easier on the eye but, while I haven't tested, I would expect it to perform better. Note that you'd need to include a call to ToList in the original code to make them equivalent.

Thanks for the reply, sometimes I spend so much time just to implement using LINQ so its more declarative and easier on the eye. Guess if I couldn't get it to work I will go back to nested loops (y)
 
Thanks, yes enum was meant to be called FeatureType. Agree that better performance isn't always true but your 2 lines of code is way better than nested loops :)

The enum list is everything I want so in the above example with FeatureType.TORN and FeatureType.RIPPED I want tornAndRipped and not just ripped. Do I not need an All instead of Any somewhere?
You'd use All, but you wouldn't call it on Features; you want to know "are all of TORN and RIPPED present in Features" but because Features is a list of objects that is more than just the types, we need to have a check that "for all types, any feature contains the type"

C#:
var peopleWithAtLeast1Torn_And_LooseJean = people.Where(p => p.Jeans.Any(j => want.All(w => j.Features.Any(f => f.Name == w))));

This is hard to write out in english, but it's like "people where any of their jeans have all of (the wanted types) present in any of the particular jean's features"

If you want "people who have at least one jean that is TORN and at least one different jean that is LOOSE" that's different again
 
Last edited:
Things would be so much easier if you had defined FeatureType to be a flag, and assumed that he product code is with the jeans, and not with the feature.

C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

var people = new List<Person>()
{
    new Person("John")
    {
        Jeans = new ()
        {
            new Jeans("T", Features.Torn),
            new Jeans("TS", Features.Torn | Features.Skinny)
        }
    },
    new Person("Paul")
    {
        Jeans = new ()
        {
            new Jeans("S", Features.Skinny),
            new Jeans("TL", Features.Torn | Features.Loose)
        }
    },
    new Person("Ringgo")
    {
        Jeans = new()
        {
            new Jeans("L", Features.Loose),
            new Jeans("TTP", Features.Torn | Features.Tapered)
        }
    },
};

var want = Features.Torn | Features.Loose;
var results = people.Where(p => p.Jeans.Any(j => j.Features == want));
foreach(var result in results)
    Console.WriteLine(result);

record Person(string Name)
{
    public List<Jeans> Jeans { get; init; }

    public override string ToString()
    {
        var sb = new StringBuilder();
        sb.Append($"Person: {{ Name = {Name}, Jeans = [ ");
        sb.Append(string.Join(", ", Jeans));
        sb.Append("] }");
        return sb.ToString();
    }
}
record Jeans(string ProductCode, Features Features);

[Flags]
public enum Features
{
    Torn = 1,
    Skinny = 2,
    Loose = 4,
    Tapered = 8
}
 
As compared to your current data structure:
C#:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

var people = new List<Person>()
{
    new Person("John")
    {
        Jeans = new()
        {
            new() { Features = new() { new("T", FeatureType.Torn) } },
            new() { Features = new() { new("T", FeatureType.Torn), new("S", FeatureType.Skinny) } },
        }
    },
    new Person("Paul")
    {
        Jeans = new()
        {
            new() { Features = new() { new("S", FeatureType.Skinny) } },
            new() { Features = new() { new("T", FeatureType.Torn), new("L", FeatureType.Loose) } },
        }
    },
    new Person("Ringgo")
    {
        Jeans = new()
        {
            new() { Features = new() { new("L", FeatureType.Loose) } },
            new() { Features = new() { new("T", FeatureType.Torn), new("TP", FeatureType.Tapered) } },
        }
    },
};

var want = new[] { FeatureType.Torn, FeatureType.Loose };
var results = people.Where(p => p.Jeans
                                 .Any(j => j.Features
                                            .Select(f => f.Name)
                                            .Intersect(want)
                                            .Count() == want.Length
                                     )
                          );
foreach(var result in results)
    Console.WriteLine(result);


record Person(string Name)
{
    public List<Jeans> Jeans { get; init; }

    public override string ToString()
    {
        var sb = new StringBuilder();
        sb.Append($"Person: {{ Name = {Name}, Jeans = [ ");
        sb.Append(string.Join(", ", Jeans));
        sb.Append("] }");
        return sb.ToString();
    }
}

record Jeans
{
    public List<Feature> Features;

    public override string ToString()
    {
        var sb = new StringBuilder();
        sb.Append("Jeans: { Features = [ ");
        sb.Append(string.Join(", ", Features));
        sb.Append("] }");
        return sb.ToString();
    }
}

record Feature(string ProductCode, FeatureType Name);

public enum FeatureType
{
    Torn,
    Skinny,
    Loose,
    Tapered
}
 
In the flags version, maybe consider bitwise &'ing the actual features the jeans have when checking equality to Torn|Loose if you want to allow e.g. Torn|Loose|Tapered to qualify as having Torn|Loose

i.e. j => (j.Features & want) == want)
 
Here's my benchmark run of LINQ vs nested for loops:
Code:
BenchmarkDotNet=v0.13.2, OS=Windows 10 (10.0.19044.2251/21H2/November2021Update)
AMD Ryzen 7 PRO 2700U w/ Radeon Vega Mobile Gfx, 1 CPU, 8 logical and 4 physical cores
.NET SDK=7.0.100
  [Host]     : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2
  DefaultJob : .NET 7.0.0 (7.0.22.51805), X64 RyuJIT AVX2


|        Method |     Mean |     Error |    StdDev |   Median |
|-------------- |---------:|----------:|----------:|---------:|
|     UsingLinq | 3.309 us | 0.3043 us | 0.8973 us | 3.898 us |
| UsingForLoops | 1.240 us | 0.1295 us | 0.3818 us | 1.482 us |

The nested for loops code I have is pretty naive without a lot of optimizations in it:
C#:
var results = new List<Person>();
foreach(var person in People)
{
    foreach(var jeans in person.Jeans)
    {
        var found = new HashSet<FeatureType>();
        foreach(var feature in jeans.Features)
            found.Add(feature.Name);

        if (found.SetEquals(Want))
        {
            results.Add(person);
            break;
        }
    }
}

Code used to produce benchmarks in spoiler below:
C#:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Microsoft.Diagnostics.Runtime.Utilities;
using PEFile;
using System.Text;

namespace BenchmarkLinqVsFor;

public class Program
{
    readonly List<Person> People = new()
    {
        new Person("John")
        {
            Jeans = new()
            {
                new() { Features = new() { new("T", FeatureType.Torn) } },
                new() { Features = new() { new("T", FeatureType.Torn), new("S", FeatureType.Skinny) } },
            }
        },
        new Person("Paul")
        {
            Jeans = new()
            {
                new() { Features = new() { new("S", FeatureType.Skinny) } },
                new() { Features = new() { new("T", FeatureType.Torn), new("L", FeatureType.Loose) } },
            }
        },
        new Person("Ringgo")
        {
            Jeans = new()
            {
                new() { Features = new() { new("L", FeatureType.Loose) } },
                new() { Features = new() { new("T", FeatureType.Torn), new("TP", FeatureType.Tapered) } },
            }
        },
    };

    readonly FeatureType[] Want = new[] { FeatureType.Torn, FeatureType.Loose };

    [Benchmark]
    public IList<Person> UsingLinq()
    {
        return People.Where(p => p.Jeans
                                  .Any(j => j.Features
                                             .Select(f => f.Name)
                                             .Intersect(Want)
                                             .Count() == Want.Length
                                      )
                           )
                     .ToList();
    }

    [Benchmark]
    public IList<Person> UsingForLoops()
    {
        var results = new List<Person>();
        foreach(var person in People)
        {
            foreach(var jeans in person.Jeans)
            {
                var found = new HashSet<FeatureType>();
                foreach(var feature in jeans.Features)
                    found.Add(feature.Name);

                if (found.SetEquals(Want))
                {
                    results.Add(person);
                    break;
                }
            }
        }
        return results;
    }

    static void Main()
    {
#if DEBUG
        Console.WriteLine("In debug build...");
        var program = new Program();
        foreach (var person in program.UsingLinq())
            Console.WriteLine(person);
        foreach (var person in program.UsingForLoops())
            Console.WriteLine(person);
#else
        Console.WriteLine("Benchmarking...");
        BenchmarkRunner.Run(typeof(Program).Assembly);
#endif
    }
}

public record Person(string Name)
{
    public List<Jeans> Jeans { get; init; } = new List<Jeans>();

    public override string ToString()
    {
        var sb = new StringBuilder();
        sb.Append($"Person: {{ Name = {Name}, Jeans = [ ");
        sb.Append(string.Join(", ", Jeans));
        sb.Append("] }");
        return sb.ToString();
    }
}

public record Jeans
{
    public List<Feature> Features { get; init; } = new List<Feature>();

    public override string ToString()
    {
        var sb = new StringBuilder();
        sb.Append("Jeans: { Features = [ ");
        sb.Append(string.Join(", ", Features));
        sb.Append("] }");
        return sb.ToString();
    }
}

public record Feature(string ProductCode, FeatureType Name);

public enum FeatureType
{
    Torn,
    Skinny,
    Loose,
    Tapered
}
 
Before someone says that its not an apples-to-apples comparison with the LINQ using Intersect() and the for loops using HashSet, here's updated benchmarks:

Code:
|                 Method |       Mean |     Error |   StdDev |     Median |
|----------------------- |-----------:|----------:|---------:|-----------:|
|     UsingLinqIntersect | 3,418.1 ns | 315.87 ns | 931.3 ns | 3,921.2 ns |
| UsingForLoopsIntersect |   719.0 ns |  78.92 ns | 232.7 ns |   880.4 ns |
|       UsingLinqHashSet | 3,078.0 ns | 315.08 ns | 878.3 ns | 3,315.4 ns |
|   UsingForLoopsHashSet | 1,676.5 ns | 131.82 ns | 386.6 ns | 1,621.8 ns |

// * Warnings *
MultimodalDistribution
  Program.UsingLinqIntersect: Default     -> It seems that the distribution can have several modes (mValue = 2.82)
  Program.UsingForLoopsIntersect: Default -> It seems that the distribution can have several modes (mValue = 3.2)
  Program.UsingForLoopsHashSet: Default   -> It seems that the distribution can have several modes (mValue = 2.89)

It's late though, so I may have screwed up some of the implementations:
C#:
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using Microsoft.Diagnostics.Runtime.Utilities;
using PEFile;
using System.Text;

namespace BenchmarkLinqVsFor;

public class Program
{
    readonly List<Person> People = new()
    {
        new Person("John")
        {
            Jeans = new()
            {
                new() { Features = new() { new("T", FeatureType.Torn) } },
                new() { Features = new() { new("T", FeatureType.Torn), new("S", FeatureType.Skinny) } },
            }
        },
        new Person("Paul")
        {
            Jeans = new()
            {
                new() { Features = new() { new("S", FeatureType.Skinny) } },
                new() { Features = new() { new("T", FeatureType.Torn), new("L", FeatureType.Loose) } },
            }
        },
        new Person("Ringgo")
        {
            Jeans = new()
            {
                new() { Features = new() { new("L", FeatureType.Loose) } },
                new() { Features = new() { new("T", FeatureType.Torn), new("TP", FeatureType.Tapered) } },
            }
        },
    };

    readonly FeatureType[] Want = new[] { FeatureType.Torn, FeatureType.Loose };

    [Benchmark]
    public IList<Person> UsingLinqIntersect()
    {
        return People.Where(p => p.Jeans
                                  .Any(j => j.Features
                                             .Select(f => f.Name)
                                             .Intersect(Want)
                                             .Count() == Want.Length
                                      )
                           )
                     .ToList();
    }

    [Benchmark]
    public IList<Person> UsingForLoopsIntersect()
    {
        var results = new List<Person>();
        foreach(var person in People)
        {
            foreach(var jeans in person.Jeans)
            {
                var found = new List<FeatureType>();
                foreach(var feature in jeans.Features)
                    found.Add(feature.Name);

                var intersection = new List<FeatureType>();
                foreach(var want in Want)
                {
                    if (found.Contains(want))
                        intersection.Add(want);
                }

                if (intersection.Count == Want.Length)
                {
                    results.Add(person);
                    break;
                }
            }
        }
        return results;
    }

    [Benchmark]
    public IList<Person> UsingLinqHashSet()
    {
        return People.Where(p => p.Jeans
                                  .Any(j => j.Features
                                             .Select(f => f.Name)
                                             .ToHashSet()
                                             .SetEquals(Want)
                                      )
                           )
                     .ToList();
    }

    [Benchmark]
    public IList<Person> UsingForLoopsHashSet()
    {
        var results = new List<Person>();
        foreach (var person in People)
        {
            foreach (var jeans in person.Jeans)
            {
                var found = new HashSet<FeatureType>();
                foreach (var feature in jeans.Features)
                    found.Add(feature.Name);

                if (found.SetEquals(Want))
                {
                    results.Add(person);
                    break;
                }
            }
        }
        return results;
    }

    static void Main()
    {
#if DEBUG
        Console.WriteLine("In debug build...");
        var program = new Program();
        foreach (var person in program.UsingLinq())
            Console.WriteLine(person);
        foreach (var person in program.UsingForLoops())
            Console.WriteLine(person);
#else
        Console.WriteLine("Benchmarking...");
        BenchmarkRunner.Run(typeof(Program).Assembly);
#endif
    }
}

public record Person(string Name)
{
    public List<Jeans> Jeans { get; init; } = new List<Jeans>();

    public override string ToString()
    {
        var sb = new StringBuilder();
        sb.Append($"Person: {{ Name = {Name}, Jeans = [ ");
        sb.Append(string.Join(", ", Jeans));
        sb.Append("] }");
        return sb.ToString();
    }
}

public record Jeans
{
    public List<Feature> Features { get; init; } = new List<Feature>();

    public override string ToString()
    {
        var sb = new StringBuilder();
        sb.Append("Jeans: { Features = [ ");
        sb.Append(string.Join(", ", Features));
        sb.Append("] }");
        return sb.ToString();
    }
}

public record Feature(string ProductCode, FeatureType Name);

public enum FeatureType
{
    Torn,
    Skinny,
    Loose,
    Tapered
}
 
Indeed; one should weigh up the look of LINQ (I love it, but it's not every developer's cup of tea, and some LINQ method syntax is uuuugly) with the potential knock to performance..

..but in that keeping a level head on that perf figures actually mean - we're talking about microseconds or nanoseconds here: we need to make millions or billions of checks of torn/loose garments for LINQ to have wasted an appreciable number of seconds of server CPU time. If it'll take users of the website 3 years to chew through a million checks, it probably isn't a useful optimization to make. If the lookup is performed millions of times a day, it might make more sense

Perhaps "if it's fast/simple to write and easy for everyone to grok, do it now. If time and experience proves it's a bottleneck, remove it"
 
Back
Top Bottom