Resolved Compare two xml files and create new

PD1991

Member
Joined
Nov 17, 2021
Messages
18
Programming Experience
1-3
Hello,

I am creating two xml files by serializing xml class object and I want to compare both the files and create a new one.
Is there any simpler way to do so.
Eg.
* Case 1
C#:
File1.xml
<Model>
<ChildGroups>
<Group>
<Name>Grp1</Name>
</Group>
<Group>
<Name>Grp2</Name>
</Group>
</ChildGroups>
<ChildGroups>
<Group>
<Name>Grp3</Name>
</Group>
<Group>
<Name>Grp4</Name>
</Group>
</ChildGroups>
</Model>

File2.xml
C#:
<Model>
<ChildGroups>
<Group>
<Name>Grp1</Name>
</Group>
</ChildGroups>
<ChildGroups>
<Group>
<Name>Grp3</Name>
</Group>
<Group>
<Name>Grp4</Name>
</Group>
</ChildGroups>
</Model>

Output.xml
C#:
<Model>
<ChildGroups>
<Group>
<Name>Grp1</Name>
</Group>
<Group>
<Name>Grp2_ToBeDeleted</Name> //Rename the node as it don't exists in File1.xml
</Group>
</ChildGroups>
<ChildGroups>
<Group>
<Name>Grp3</Name>
</Group>
<Group>
<Name>Grp4</Name>
</Group>
</ChildGroups>
</Model>

* Case 2
File1.xml
C#:
<Model>
<ChildGroups>
<Group>
<Name>Grp1</Name>
</Group>
</ChildGroups>
<ChildGroups>
<Group>
<Name>Grp4</Name>
</Group>
<Group>
<Name>Grp4</Name>
</Group>
</ChildGroups>
</Model>

File2.xml
C#:
<Model>
<ChildGroups>
<Group>
<Name>Grp1</Name>
</Group>
<Group>
<Name>Grp2</Name>
</Group>
</ChildGroups>
<ChildGroups>
<Group>
<Name>Grp3</Name>
</Group>
<Group>
<Name>Grp4</Name>
</Group>
</ChildGroups>
</Model>

Output.xml
C#:
<Model>
<ChildGroups>
<Group>
<Name>Grp1</Name>
</Group>
<Group>
<Name>Grp2</Name> //Add Grp2 as it exists in File2.xml
</Group>
</ChildGroups>
<ChildGroups>
<Group>
<Name>Grp3</Name>
</Group>
<Group>
<Name>Grp4</Name>
</Group>
</ChildGroups>
</Model>
 
Last edited by a moderator:
Solution
I think I have understood the question and requirements:
  • compare groups by hierarchy, identify group by child Name value
  • start with file2 (modify as we go), remove groups that doesn't exist in file1
  • add extra groups from file1 to file2
  • save as new output file

Based on the sample files I wrote this example code:
C#:
//using System.Xml.Linq;
//using System.Xml.XPath;

void MergeXml()
{
    var doc1 = XDocument.Load(@"C:\Users\xylo\Downloads\Files\File1.xml");
    var doc2 = XDocument.Load(@"C:\Users\xylo\Downloads\Files\File2.xml");

    //start with file2 (modify as we go), remove groups that doesn't exist in file1
    foreach (var group in doc2.Descendants("Group").ToArray())
    {
        var xpath =...
From what i can see, you want File1 to look like File2. Why not just use File2 and be done with it?
 
From what i can see, you want File1 to look like File2. Why not just use File2 and be done with it?
I shared just an example I my case their are more than 2000 groups and length of groups can be varied in both file1 and file2.
Nodes missing can be either in file1 or in file2. It should get added in file2 if nodes are missing in file1 and vice versa.
 
Last edited:
Why don't you show us the current cod that you have to what you want? Or of you don't currently have that, some pseudo code. Right now it's not quite clear to me exactly want you want to happen.

As and aside, there are some great published paper out there regarding doing diffs between trees, and in particular XML parse trees. Using these can be that basis of your code.
 
Why don't you show us the current cod that you have to what you want? Or of you don't currently have that, some pseudo code. Right now it's not quite clear to me exactly want you want to happen.

As and aside, there are some great published paper out there regarding doing diffs between trees, and in particular XML parse trees. Using these can be that basis of your code.
Below is my current code for your reference. With my existing code. It only adds nodes that are missing in File2.xml. Whereas I want to combine both the files create a new xml file and than iterate to mark the nodes as '_ToBeDeleted'.
Which is not happening.
Case 1 :
File 1
<Group><Name>ABSR</Name><Group>
File 2
<Group><Name>ABSR</Name><Group>
Output
<Group><Name>ABSR</Name><Group>

Case 2 :
File 1
<Group><Name>ABSR</Name><Group>
File 2
<Group><Name>ABSR_001</Name><Group>
Output
<Group><Name>ABSR_ToBeDeleted</Name><Group> //As its not in File 2
<Group><Name>ABSR_001</Name><Group>

Case 3 :
File 1
<Group><Name>ABSR_ToBeDeleted</Name><Group>
File 2
<Group><Name>ABSR</Name><Group>
Output
<Group><Name>ABSR</Name><Group>

Case 4 :
File 1
<Group><Name>ABSR__001_ToBeDeleted</Name><Group>
File 2
<Group><Name>ABSR</Name><Group>
Output
<Group><Name>ABSR__001_ToBeDeleted</Name><Group>
<Group><Name>ABSR</Name><Group>

C#:
var result = new XmlXPathDocument();
var child = new XmlXPathDocument();

child.Load("File2.xml");
result.Load("File1.xml");


child.AddDiscriminantAttribute("Group", string.Empty);
child.InjectXml(result);


public class XmlXPathDocument : XmlDocument
  {
    public const string XmlNamespaceUri = "http://www.w3.org/2000/xmlns/";
    public const string XmlNamespacePrefix = "xmlns";

    internal List<Tuple<string, string>> _discriminantAttributes = new List<Tuple<string, string>>();

    public XmlXPathDocument() => Construct();
    public XmlXPathDocument(XmlNameTable nameTable) : base(nameTable) => Construct();
    public XmlXPathDocument(XmlImplementation implementation) : base(implementation) => Construct();

    protected virtual void Construct() => XPathNamespaceManager = new XmlNamespaceManager(new NameTable());

    public virtual XmlNamespaceManager XPathNamespaceManager { get; private set; }

    public override XmlElement CreateElement(string prefix, string localName, string namespaceURI) => new XmlXPathElement(prefix, localName, namespaceURI, this);

    public override XmlCDataSection CreateCDataSection(string data) => new XmlXPathCDataSection(data, this);

    public override XmlText CreateTextNode(string text) => new XmlXPathText(text, this);

    public virtual void AddDiscriminantAttribute(string name, string namespaceURI)
    {
      if (name == null)
        throw new ArgumentNullException(nameof(name));

      _discriminantAttributes.Add(new Tuple<string, string>(name, namespaceURI));
    }

    public virtual bool IsDiscriminant(XmlAttribute attribute)
    {
      if (attribute == null)
        throw new ArgumentNullException(nameof(attribute));

      foreach (var pair in _discriminantAttributes)
      {
        string ns = Nullify(attribute.NamespaceURI);
        string dns = Nullify(pair.Item2);
        if (ns == dns && pair.Item1 == attribute.LocalName)
          return true;
      }
      return false;
    }

    private static string Nullify(string text)
    {
      if (text == null)
        return null;

      text = text.Trim();
      if (text.Length == 0)
        return null;

      return text;
    }

    internal string GetPrefix(string namespaceURI)
    {
      if (string.IsNullOrEmpty(namespaceURI))
        return null;

      string prefix = XPathNamespaceManager.LookupPrefix(namespaceURI);
      if (!string.IsNullOrEmpty(prefix))
      {
        XPathNamespaceManager.AddNamespace(prefix, namespaceURI);
        return prefix;
      }

      string newPrefix;
      int index = 0;
      do
      {
        newPrefix = "ns" + index;
        if (XPathNamespaceManager.LookupNamespace(newPrefix) == null)
          break;

        index++;
      }
      while (true);
      XPathNamespaceManager.AddNamespace(newPrefix, namespaceURI);
      return newPrefix;
    }

    private static bool IsNamespaceAttribute(XmlAttribute attribute)
    {
      if (attribute == null)
        return false;

      return attribute.NamespaceURI == XmlNamespaceUri && attribute.Prefix == XmlNamespacePrefix;
    }

    private static IEnumerable<XmlAttribute> GetAttributes(IXmlXPathNode node)
    {
      var xe = node as XmlElement;
      if (xe == null)
        yield break;

      foreach (XmlAttribute att in xe.Attributes)
      {
        yield return att;
      }
    }

    private static XmlAttribute GetAttribute(IXmlXPathNode node, string name) => node is XmlElement xe ? xe.Attributes[name] : null;
    private static XmlAttribute GetAttribute(IXmlXPathNode node, string localName, string ns) => node is XmlElement xe ? xe.Attributes[localName, ns] : null;

    public virtual bool InjectXml(XmlDocument target)
    {
      if (target == null)
        throw new ArgumentNullException(nameof(target));

      if (DocumentElement == null)
        return false;

      bool changed = false;
      foreach (XmlNode node in SelectNodes("//node()"))
      {
        var xelement = node as IXmlXPathNode;
        if (xelement == null)
          continue;

        if (string.IsNullOrEmpty(xelement.XPathExpression))
          continue;

        XmlNode other = target.SelectSingleNode(xelement.XPathExpression, XPathNamespaceManager);
        if (other != null)
        {
          if (other is XmlElement otherElement)
          {
            foreach (XmlAttribute att in GetAttributes(xelement))
            {
              if (IsNamespaceAttribute(att))
                continue;

              if (otherElement.Attributes[att.LocalName, att.NamespaceURI]?.Value != att.Value)
              {
                otherElement.SetAttribute(att.LocalName, att.NamespaceURI, att.Value);
                changed = true;
              }
            }
            continue;
          }
        }

        if (node is XmlXPathElement element)
        {
          XmlElement parent = EnsureTargetParent(xelement, target, out changed);
          XmlElement targetElement = target.CreateElement(element.LocalName, element.NamespaceURI);
          changed = true;
          if (parent == null)
          {
            target.AppendChild(targetElement);
          }
          else
          {
            parent.AppendChild(targetElement);
          }

          foreach (XmlAttribute att in GetAttributes(xelement))
          {
            if (IsNamespaceAttribute(att))
              continue;

            targetElement.SetAttribute(att.LocalName, att.NamespaceURI, att.Value);
          }
          continue;
        }

        if (node is XmlXPathCDataSection cdata)
        {
          XmlElement parent = EnsureTargetParent(xelement, target, out changed);
          var targetCData = target.CreateCDataSection(cdata.Value);
          changed = true;
          if (parent == null)
          {
            target.AppendChild(targetCData);
            AppendNextTexts(node, targetCData, target);
          }
          else
          {
            if (parent.ChildNodes.Count == 1 && parent.ChildNodes[0] is XmlCharacterData)
            {
              parent.RemoveChild(parent.ChildNodes[0]);
            }
            parent.AppendChild(targetCData);
            AppendNextTexts(node, targetCData, parent);
          }
          continue;
        }

        if (node is XmlXPathText text)
        {
          XmlElement parent = EnsureTargetParent(xelement, target, out changed);
          var targetText = target.CreateTextNode(text.Value);
          changed = true;
          if (parent == null)
          {
            target.AppendChild(targetText);
            AppendNextTexts(node, targetText, target);
          }
          else
          {
            if (parent.ChildNodes.Count == 1 && parent.ChildNodes[0] is XmlCharacterData)
            {
              parent.RemoveChild(parent.ChildNodes[0]);
            }
            parent.AppendChild(targetText);
            AppendNextTexts(node, targetText, parent);
          }
          continue;
        }
      }
      return changed;
    }

    private static void AppendNextTexts(XmlNode textNode, XmlNode targetTextNode, XmlNode parent)
    {
      do
      {
        if (textNode.NextSibling is XmlText text)
        {
          var newText = targetTextNode.OwnerDocument.CreateTextNode(text.Value);
          parent.AppendChild(newText);
        }
        else
        {
          var cdata = textNode.NextSibling as XmlCDataSection;
          if (cdata == null)
            break;

          var newCData = targetTextNode.OwnerDocument.CreateCDataSection(cdata.Value);
          parent.AppendChild(newCData);
        }
        textNode = textNode.NextSibling;
      }
      while (true);
    }

    private static XmlElement EnsureTargetParent(IXmlXPathNode element, XmlDocument target, out bool changed)
    {
      changed = false;
      if (element.ParentNode is XmlXPathElement parent)
      {
        if (string.IsNullOrEmpty(parent.XPathExpression))
          return null;

        if (target.SelectSingleNode(parent.XPathExpression, element.OwnerDocument.XPathNamespaceManager) is XmlElement targetElement)
          return targetElement;

        var parentElement = EnsureTargetParent(parent, target, out changed);
        targetElement = target.CreateElement(parent.LocalName, parent.NamespaceURI);
        parentElement.AppendChild(targetElement);
        changed = true;
        return targetElement;
      }
      return target.DocumentElement;
    }
  }

  public class XmlXPathElement : XmlElement, IXmlXPathNode
  {
    private Lazy<string> _xPathExpression;

    public XmlXPathElement(string prefix, string localName, string namespaceURI, XmlXPathDocument doc) : base(prefix, localName, namespaceURI, doc)
    {
      _xPathExpression = new Lazy<string>(() => GetXPathExpression());
    }

    public new XmlXPathDocument OwnerDocument => (XmlXPathDocument)base.OwnerDocument;
    public virtual string XPathExpression => _xPathExpression.Value;

    private static string GetAttEscapedValue(string value)
    {
      if (value.IndexOf('\'') >= 0)
        return "=\"" + value.Replace("\"", "&quot;") + "\"";

      return "='" + value + "'";
    }

    private string GetDiscriminantAttributeXPath()
    {
      foreach (var att in OwnerDocument._discriminantAttributes)
      {
        XmlAttribute disc;
        if (string.IsNullOrEmpty(att.Item2))
        {
          disc = GetAttributeNode(att.Item1);
        }
        else
        {
          disc = GetAttributeNode(att.Item1, att.Item2);
        }

        if (disc != null)
        {
          string newPrefix = OwnerDocument.GetPrefix(NamespaceURI);
          string name = Name + "[@" + disc.Name + GetAttEscapedValue(disc.Value) + "]";
          if (newPrefix != null)
          {
            name = newPrefix + ":" + name;
          }
          return name;
        }
      }
      return null;
    }

    private string GetAttributesXPath()
    {
      if (Attributes.Count == 0)
        return null;

      var sb = new StringBuilder();
      foreach (XmlAttribute att in Attributes)
      {
        if (sb.Length > 0)
        {
          sb.Append(" and ");
        }

        sb.Append("@");
        sb.Append(att.Name);
        sb.Append(GetAttEscapedValue(att.Value));
        OwnerDocument.GetPrefix(att.NamespaceURI);
      }

      var text = sb.ToString().Trim();
      if (text.Length == 0)
        return null;

      return "[" + text + "]";
    }

    private string GetXPath(XmlNodeList parentNodes)
    {
      string discriminant = GetDiscriminantAttributeXPath();
      if (discriminant != null)
        return discriminant;

      string name = Name;
      string newPrefix = OwnerDocument.GetPrefix(NamespaceURI);
      if (newPrefix != null)
      {
        name = newPrefix + ":" + LocalName;
      }

      if (parentNodes.Count == 1)
        return name;

      var sameName = new List<XmlElement>();
      foreach (XmlNode node in parentNodes)
      {
        if (node.NodeType != XmlNodeType.Element)
          continue;

        if (node.Name == Name)
        {
          sameName.Add((XmlElement)node);
        }
      }

      if (sameName.Count == 1)
        return name;

      string byIndex = null;
      var sameAtts = new List<XmlElement>();
      for (int i = 0; i < sameName.Count; i++)
      {
        if (sameName[i] == this)
        {
          byIndex = name + "[" + (i + 1) + "]";
          continue;
        }

        bool same = true;
        foreach (XmlAttribute att in Attributes)
        {
          XmlAttribute sameAtt = sameName[i].Attributes[att.LocalName, att.NamespaceURI];
          if (sameAtt == null || string.Compare(sameAtt.Value, att.Value, StringComparison.OrdinalIgnoreCase) != 0)
          {
            same = false;
            break;
          }
        }

        if (same)
        {
          sameAtts.Add(sameName[i]);
        }
      }

      if (sameAtts.Count == 0)
        return name + GetAttributesXPath();

      return byIndex;
    }

    private string GetXPathExpression()
    {
      if (ParentNode == null)
      {
        string name = Name;
        string newPrefix = OwnerDocument.GetPrefix(NamespaceURI);
        if (newPrefix != null)
        {
          name = newPrefix + ":" + name;
        }
        return name;
      }

      string expr = GetXPath(ParentNode.ChildNodes);
      if (ParentNode is XmlXPathElement parent)
      {
        expr = parent.XPathExpression + "/" + expr;
      }

      if (ParentNode.NodeType == XmlNodeType.Document)
      {
        expr = "/" + expr;
      }
      return expr;
    }
  }

  public class XmlXPathText : XmlText, IXmlXPathNode
  {
    private Lazy<string> _xPathExpression;

    public XmlXPathText(string data, XmlXPathDocument doc) : base(data, doc)
    {
      _xPathExpression = new Lazy<string>(() => GetTextXPathExpression(this));
    }

    public new XmlXPathDocument OwnerDocument => (XmlXPathDocument)base.OwnerDocument;
    public virtual string XPathExpression => _xPathExpression.Value;

    internal static string GetTextXPathExpression(XmlNode node)
    {
      if (node.ParentNode is IXmlXPathNode element)
        return element.XPathExpression + "/text()";

      return null;
    }
  }

  public class XmlXPathCDataSection : XmlCDataSection, IXmlXPathNode
  {
    private Lazy<string> _xPathExpression;

    public XmlXPathCDataSection(string data, XmlXPathDocument doc) : base(data, doc)
    {
      _xPathExpression = new Lazy<string>(() => XmlXPathText.GetTextXPathExpression(this));
    }

    public new XmlXPathDocument OwnerDocument => (XmlXPathDocument)base.OwnerDocument;
    public virtual string XPathExpression => _xPathExpression.Value;
  }

  public interface IXmlXPathNode
  {
    string XPathExpression { get; }
    XmlNode ParentNode { get; }
    XmlXPathDocument OwnerDocument { get; }
  }
}
 
Last edited:
Group/Name is simple, but what is the significance of <childgroups> that you posted in #1? Does it matter at all? must it be preserved? can it be merged? can it be omitted? if a "name" is added to output from file 2 which childgroup is it added to? is empty childgroups removed?

"_ToBeDeleted" rename/remove is just because you don't know how to remove it first?
 
Group/Name is simple, but what is the significance of <childgroups> that you posted in #1? Does it matter at all? must it be preserved? can it be merged? can it be omitted? if a "name" is added to output from file 2 which childgroup is it added to? is empty childgroups removed?

"_ToBeDeleted" rename/remove is just because you don't know how to remove it first?
Attached is the sample files for reference.
File1 has a group 'ABSR001' and File2 has a group 'ABSR001_New'.
So output should have both 'ABSR001' with a suffix '_ToBeDeleted' and 'ABSR001_New'.
 

Attachments

  • Files.zip
    375.8 KB · Views: 30
Nodes missing can be either in file1 or in file2. It should get added in file2 if nodes are missing in file1 and vice versa.
It this the only rule? It sounds like you just want all distinct "names" from both files?
 
It this the only rule? It sounds like you just want all distinct "names" from both files?
Yes and If we can add the suffix in the same logic would be nicer. Else I was thinking of iterating and checking whether nodes existing in File2 are part of Output file If not than or remove suffix '_ToBeDeleted'.
 
There is a tree of groups/childgroups in those files, is this hierarchy of no concern? Do you compare group(name) no matter where it is? If so, do you intend to flatten the tree somehow?
If you for example only want first level group then there is only one group in each of your example files.
 
There is a tree of groups/childgroups in those files, is this hierarchy of no concern? Do you compare group(name) no matter where it is? If so, do you intend to flatten the tree somehow?
If you for example only want first level group then there is only one group in each of your example files.
Output file should have right parent and hierarchy. I want to add group inside the right parent.
 
So if a name is to be deleted because the entire subtree for that name is missing in file2, then the entire heirarchy needs to created in the output only to able to put in a name that indicates it should be deleted?
 
I think I have understood the question and requirements:
  • compare groups by hierarchy, identify group by child Name value
  • start with file2 (modify as we go), remove groups that doesn't exist in file1
  • add extra groups from file1 to file2
  • save as new output file

Based on the sample files I wrote this example code:
C#:
//using System.Xml.Linq;
//using System.Xml.XPath;

void MergeXml()
{
    var doc1 = XDocument.Load(@"C:\Users\xylo\Downloads\Files\File1.xml");
    var doc2 = XDocument.Load(@"C:\Users\xylo\Downloads\Files\File2.xml");

    //start with file2 (modify as we go), remove groups that doesn't exist in file1
    foreach (var group in doc2.Descendants("Group").ToArray())
    {
        var xpath = GetXpathByChildName(group);
        var other = doc1.XPathSelectElement(xpath);
        if (other == null)
            group.Remove();
    }

    //add extra groups from file1 to file2
    foreach (var group in doc1.Descendants("Group").ToArray())
    {
        var xpath = GetXpathByChildName(group);
        var other = doc2.XPathSelectElement(xpath);
        if (other == null)
        {
            var parentgroup = group.Ancestors("Group").FirstOrDefault();
            if (parentgroup == null)
            {
                doc2.XPathSelectElement("//Model/Groups").Add(group);
            }
            else
            {
                var parentXpath = GetXpathByChildName(parentgroup);
                var otherparent = doc2.XPathSelectElement(parentXpath);
                otherparent.Element("ChildGroups").Add(group);                     
            }         
        }             
    }
    //save as output
    doc2.Save(@"C:\Users\xylo\Downloads\Files\output.xml");
}

string GetXpathByChildName(XElement element)
{
    var name = element.Element("Name").Value;
    return string.Join("/", element.AncestorsAndSelf().Reverse().Select(a => a.Name.LocalName).ToArray()) + $"[Name='{name}']";
}
 
Solution
I think I have understood the question and requirements:
  • compare groups by hierarchy, identify group by child Name value
  • start with file2 (modify as we go), remove groups that doesn't exist in file1
  • add extra groups from file1 to file2
  • save as new output file

Based on the sample files I wrote this example code:
C#:
//using System.Xml.Linq;
//using System.Xml.XPath;

void MergeXml()
{
    var doc1 = XDocument.Load(@"C:\Users\xylo\Downloads\Files\File1.xml");
    var doc2 = XDocument.Load(@"C:\Users\xylo\Downloads\Files\File2.xml");

    //start with file2 (modify as we go), remove groups that doesn't exist in file1
    foreach (var group in doc2.Descendants("Group").ToArray())
    {
        var xpath = GetXpathByChildName(group);
        var other = doc1.XPathSelectElement(xpath);
        if (other == null)
            group.Remove();
    }

    //add extra groups from file1 to file2
    foreach (var group in doc1.Descendants("Group").ToArray())
    {
        var xpath = GetXpathByChildName(group);
        var other = doc2.XPathSelectElement(xpath);
        if (other == null)
        {
            var parentgroup = group.Ancestors("Group").FirstOrDefault();
            if (parentgroup == null)
            {
                doc2.XPathSelectElement("//Model/Groups").Add(group);
            }
            else
            {
                var parentXpath = GetXpathByChildName(parentgroup);
                var otherparent = doc2.XPathSelectElement(parentXpath);
                otherparent.Element("ChildGroups").Add(group);                    
            }        
        }            
    }
    //save as output
    doc2.Save(@"C:\Users\xylo\Downloads\Files\output.xml");
}

string GetXpathByChildName(XElement element)
{
    var name = element.Element("Name").Value;
    return string.Join("/", element.AncestorsAndSelf().Reverse().Select(a => a.Name.LocalName).ToArray()) + $"[Name='{name}']";
}
What about the suffix to be added to a group name Of group exists in file 1 but not in file 2 than. Group should get added in output file and suffix should get add to the name.
Also, if group exists in file 2 and not in file 1 it shouldn't be removed. It should be their in output file..
 
So you just update the Name element. In place of line 15:
C#:
group.Element("Name").Value += "_ToBeDeleted";
Insert at line 25:
C#:
group.Element("Name").Value += "_New";
 
Back
Top Bottom