Click here to Skip to main content
15,881,248 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
So I have an XML file that I want to bulk load into a database.
To create the database it would be nice to extract an XSD from it first.

But the file is 80GB, so it won't open in visual studio.
It's also a fairly advanced structure and my samples shows that it might be fairly easy to miss out on some quirks if I don't check the whole file.

Using an online converter is out of the question because of the file size. And I would like to avoid paying for a tool as this probably is a one off.

I need some ideas please.

What I have tried:

Well, I've tried to google for ideas
Posted
Updated 20-Apr-20 7:38am
Comments
F-ES Sitecore 20-Apr-20 11:51am    
Just learn xsd and do it yourself.
Jörgen Andersson 20-Apr-20 12:54pm    
That's the backup plan
Dave Kreskowiak 20-Apr-20 14:47pm    
That's what I ended up doing. You can use the XSD tool as a start, if it works for you, but you're going to have to do through the resulting XSD and correct the mistakes it made from the assumptions it made about your XML.

In my case, my XML was my own creation and was evolving and I was developing the format. I just scrapped the tool generated XSD altogether and started it from scratch. I ended up with a module XSD design that I can modify as I evolve the XML.
Jörgen Andersson 20-Apr-20 14:52pm    
The problem is just that I didn't create the XML-file myself.
My first goal is to find all the classes and fields.
If I can use a tool I'll save a lot of time.
Maciej Los 20-Apr-20 12:07pm    
Jorgen, have you tried to use XML Schema Definition Tool (Xsd.exe) | Microsoft Docs[^]?

BTW: is there any documentation to that XML?
In case when documentation does not exist, try to use XmlReader Class (System.Xml) | Microsoft Docs[^] in async mode to read the content of xml file or try to load that xml into DataSet.

Well...

As agreed with @phil.o and @CHill60, the best way is to use XML Schema Definition Tool (Xsd.exe) | Microsoft Docs[^]

Alternativelly, you can use XmlReader Class (System.Xml) | Microsoft Docs[^] to read xml data in async mode.

Good luck!
 
Share this answer
 
Comments
Jörgen Andersson 20-Apr-20 13:05pm    
Getting an OutOfMemoryException.
It seems like it's trying to add the whole file to memory.
Maciej Los 20-Apr-20 13:19pm    
So, try to use XmlReader and try to split that big xml into smaller parts.
Jörgen Andersson 20-Apr-20 13:22pm    
I'm looking at that at the moment, but it's not an out of the box solution. It seems to need quite some work.
Jörgen Andersson 27-Apr-20 4:46am    
My experiences:
When using XSD.exe it's important to check whether you're using the 32 or 64 bit version, the 32 bit version crashed after five seconds while the 64 bit version actually ran for almost four hours before running out of resources. My conclusion is that if the file fit's in memory, the 64-bit version of XSD.exe would have done the job.

Instead I ended up using the XmlReader finding all element names using this code that I found on the net:
Dictionary<string, list<int="">> nodeTable = new Dictionary<string, list<int="">>();
using (XmlReader reader = XmlReader.Create(documentPath))
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element)
{
if (!nodeTable.ContainsKey(reader.LocalName))
{
nodeTable.Add(reader.LocalName, new List<int>(new int[] { reader.Depth }));
}
else if (!nodeTable[reader.LocalName].Contains(reader.Depth))
{
nodeTable[reader.LocalName].Add(reader.Depth);
}
}
}
}
Console.WriteLine("The node table has {0} items.", nodeTable.Count);
foreach (KeyValuePair<string, list<int="">> kv in nodeTable)
{
Console.Write("{0} [{1}]", kv.Key, kv.Value.Count);
for (int i = 0; i < kv.Value.Count; i++)
{
if (i < kv.Value.Count - 1)
{
Console.Write("{0}, ", kv.Value[i]);
}
else
{
Console.WriteLine(kv.Value[i]);
}
}
}

Console.ReadKey();
And then searching the document for all Elements, cutting and pasting all Items to a new document containing all existing elements. And then I ran Xsd.exe on that file.
Maciej Los 27-Apr-20 8:19am    
Sounds like an answer...
According to this CodeProject article: Create an XSD Schema….without knowing a darn thing about XSD.[^]

you can generate XSD from a DataSet as follows:
MyDataSet.WriteXmlSchema(@"MySchema.xsd");
 
Share this answer
 
Comments
Jörgen Andersson 20-Apr-20 13:43pm    
Yeah, but the problem is that I don't have the dataset, I have an XML file 80 GB in size, so I can't even create the dataset without running out of memory.
At least not without a serious upgrade in RAM. :)
RickZeeland 20-Apr-20 14:19pm    
As Maciej suggested it would be best to extract the significant bits from the XML file as it probably has a lot of redundant data in it ... and as a last resort you can always drink some good wine to cheer yourself up :)
Jörgen Andersson 20-Apr-20 14:33pm    
There is definitely a lot of "redundant" data in it, every record seems to be between 10kb to 20kb approximately
The problem is just that while some part are always there some others come and go. Or lots of nullable fields if you want to think that way.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900