how to Remove duplicates from List<String[]>

ram_rocks

Active member
Joined
Jun 14, 2021
Messages
27
Programming Experience
1-3
Hello all,
can anyone please give me an idea how to remove duplicates from below list string array

I have a final list List<string[]> finaldata -> (copied all the data from excel rows 2000 and columns 12)
1) in some of the rows I have same data in all columns and I would like to remove that from list
2) some rows has 2 or 3 columns same data and remaining different and these can still be in list

Please let me know like how can I implement this in optimized way..
1624558915088.png
 
How are you setting the data in your string array?

Where you are doing that is where you prevent the data from being set which you don't want duplicates of.
 
If you use a Hashset, you can avoid duplicates being entered. If you don't want to go that route, you will need to apply some filters to check your current string array for the values being processed against the entries you already have in your array.
 
Assuming that you cannot prevent the duplicates getting into the list in the first place, here's an option for removing them. Firstly, define this class:
C#:
public class ArrayEqualityComparer<TArray> : EqualityComparer<TArray[]>
{
    /// <inheritdoc />
    public override bool Equals(TArray[] x, TArray[] y)
    {
        return x != null && y != null && x.SequenceEqual(y);
    }

    /// <inheritdoc />
    public override int GetHashCode(TArray[] obj)
    {
        return obj.Aggregate(0, (current, s) => current ^ s.GetHashCode());
    }
}
You can then use the class like this:
C#:
finaldata = finaldata.Distinct(new ArrayEqualityComparer<string>()).ToList();
Two things to note there:
  1. That only considers two arrays to be the same if the elements are in the same order.
  2. It will create a new List<string[]> object rather than modifying the existing one.
I don't think the first point will be an issue for you but, if the second point is a problem, let me know and I'll provide an implementation for the more complex option of removing items in-place.
 
1) in some of the rows I have same data in all columns and I would like to remove that from list
I guess I misread the description above. I interpreted that to mean that the entire row has all the same data (e.g. { "magic missile", "magic missile", "magic missile", "magic missile" }), as oppose to what was described in 2) where some of data maybe different (e.g. { "magic missile", "magic missile", "fireball", "magic missile" })

With my interpretation, I would have done something like:
C#:
finalData = finalData.Select(r => !r.All(c => c == r[0])).ToList();
 
@Skydiver @jmcilhinney
I mean like
if row[1] = abc, xyz, ford, ferrari, bmw, benz, honda
row[2] = abc, xyz, gm, maruthi, ola, ather, tvs
row[3] = abc, sxc, ford, ferrari, bmw, benz, honda
row[4] = abc, xyz, gm, ford, ola, ather, tvs

here row1 and row3 starting from 3 index has all same data so I would like to remove row3 from list
basically I would like to search from 3 index and if has same data in 3 TO END columns them remove other row (eg:row3)

and if you see row2 and row4, 4th index is different so I would like to keep them
 
I guess I misread the description above. I interpreted that to mean that the entire row has all the same data (e.g. { "magic missile", "magic missile", "magic missile", "magic missile" }), as oppose to what was described in 2) where some of data maybe different (e.g. { "magic missile", "magic missile", "fireball", "magic missile" })

With my interpretation, I would have done something like:
C#:
finalData = finalData.Select(r => !r.All(c => c == r[0])).ToList();
I mean like
if row[1] = abc, xyz, ford, ferrari, bmw, benz, honda
row[2] = abc, xyz, gm, maruthi, ola, ather, tvs
row[3] = abc, sxc, ford, ferrari, bmw, benz, honda
row[4] = abc, xyz, gm, ford, ola, ather, tvs

here row1 and row3 starting from 3 index has all same data so I would like to remove row3 from list
basically I would like to search from 3 index and if has same data in 3 TO END columns them remove other row (eg:row3)

and if you see row2 and row4, 4th index is different so I would like to keep them
 
I recommend using Distinct() variant that uses an IEqualityComparer.


Your IEqualityComparer implementation can use SequenceEqual() extension method.
 
Last edited:
Back
Top Bottom