Click here to Skip to main content
15,886,199 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi, I have started learning programming for my research purpose. I am having bit of trouble working with XML files since I an new to it. Well my question is, How to match an attribute value of node element and update duplicate node with sub-element in XML file?

Input XML file:
XML
    <brand by="rtrtrtrt" date="2014/01/01" name="OOP-112200" insti="TGA">
    <exp name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
    <elem name="abcd" sound_freq="abcd" c_ty="pv">
    <feature number="48">
    <tfgt v="0.1466469683747654" y="0.0" units="elem">    
    <swan sound_freq="abcd" first_name="g7tty" description="xyz">
    
    <elem tension="SGCGGSCGSC" name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" s_c="0">
    <feature number="5748">
    <tfgt v="0.1466469683747654" y="0.0" units="elem">    
    <swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
    <act db="Acc" value="PM_1234555|ta">
    <act db="pval" value="0.1">
    <act db="xyz" value="abc">
    <per fre="Volum_5mb" value="89.00">
    <per fre="Volum_40mb" value="44.00">
    <per fre="Volum_70mb" value="77.00">
    
    
    <elem tension="SGCGSCGSCGSCGSC" name="M_20_K40745172" sound_freq="mhr17:7907527-7907100" 
="" s_c="0">
    <feature number="5748">
    <tfgt v="0.1466469683747654" y="0.0" units="elem">    
    <swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
    <act db="Acc" value="PR_0987677|wa">
    <act db="pval" value="0.99">
    <act db="xyz" value="abc">
    <per fre="Volum_5mb" value="99.00">
	<per fre="Volum_40mb" value="57.00">
    <per fre="Volum_70mb" value="88.00">
    
    
    <elem tension="SGCGSCGSCGSCGSC" name="M_20_K40745172" sound_freq="mhr17:7907527-7907100" 
="" s_c="0">
    <feature number="57477">
    <tfgt v="0.1466469683747654" y="0.39999" units="elem">    
    <swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">

I want to find duplicate node by matching element elem name value in file and update duplicate node with sub-element in XML file. (for example duplicate in this set 'M_20_K40745172')

I am expecting :
XML
    <brand by="hhdhdh" date="2014/01/01" name="OOP-112200" insti="TGA">
    <exp name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
    <elem name="abcd" sound_freq="abcd" c_ty="pv">
    <feature number="48">
    <tfgt v="0.1466469683747654" y="0.0" units="elem">    
    <swan sound_freq="abcd" first_name="g7tty" description="xyz">
    
    <elem tension="SGCGGSCGSC" name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" s_c="0">
    <feature number="5748">
    <tfgt v="0.1466469683747654" y="0.0" units="elem">    
    <swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
    <act db="Acc" value="PM_1234555|ta">
    <act db="pval" value="0.1">
    <act db="xyz" value="abc">
    <per fre="Volum_5mb" value="89.00">
    <per fre="Volum_40mb" value="44.00">
    <per fre="Volum_70mb" value="77.00">
    
    
    <elem tension="SGCGSCGSCGSCGSC" name="M_20_K40745172" sound_freq="mhr17:7907527-7907100" 
="" s_c="0">
    <feature number="5748">
    <tfgt v="0.1466469683747654" y="0.0" units="elem">    
    <swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
    <act db="Acc" value="PR_0987677|wa">
    <act db="pval" value="0.99">
    <act db="xyz" value="abc">
    <per fre="Volum_5mb" value="99.00">
	<per fre="Volum_40mb" value="57.00">
    <per fre="Volum_70mb" value="88.00">
    
    
    <elem tension="SGCGSCGSCGSCGSC" name="M_20_K40745172" sound_freq="mhr17:7907527-7907100" 
="" s_c="0">
    <feature number="57477">
    <tfgt v="0.1466469683747654" y="0.39999" units="elem">    
    <swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
    <act db="Acc" value="PR_0987677|wa">
    <act db="pval" value="0.99">
    <act db="xyz" value="abc">
    <per fre="Volum_5mb" value="99.00">
	<per fre="Volum_40mb" value="57.00">
    <per fre="Volum_70mb" value="88.00">

Any help would be appreciated. Thanks in advance!

What I have tried:

Right now I am studying XML parsing module like etree to solve this

I use pandas data frame to modify XML earlier, but not able to modify duplicates. I dont want to ingore duplicates.

df:

             id	          pval	       Acc
    1  M_20_K40745170	   0.1	      PM_1234555|ta
    2  M_20_K40745172	   0.99	      PR_0987677|wa


Here's my script :
import xml.etree.ElementTree as et

tree = et.parse('input_xml.xml')
root = tree.getroot()

for index, row in df.iterrows():
    node = root.find('./brand/sec[@name="{}"]//mwan'.format(row['id']))
    print(node)

    for name in ['acc','pval']:
        item = node.find('./per[@name="{}"]'.format(name))

        if not item:
            item = et.SubElement(node, 'per')
            item.tail = '\n'
            item.set('fre', name)
            item.tail = '\n'
            item.set('value', (row[name]))

tree.write('out.xml')
Posted
Updated 15-Jun-21 2:04am
v3
Comments
[no name] 14-Jun-21 17:22pm    
Maybe figure out why this file needs "fixing" and see if it makes more sense to fix it where it is being created in the first place.
rebel0 15-Jun-21 0:33am    
Hi @Gerry Schmitz, I use pandas data frame to modify XML earlier, but not able to modify duplicates. I dont want to ingore duplicates
Here's my script :

import xml.etree.ElementTree as et

tree = et.parse('input_xml.xml')
root = tree.getroot()

for index, row in df.iterrows():
node = root.find('./brand/sec[@name="{}"]//mwan'.format(row['id']))
print(node)

for name in ['acc','pval']:
item = node.find('./per[@name="{}"]'.format(name))

if not item:
item = et.SubElement(node, 'per')
item.tail = '\n'
item.set('fre', name)
item.tail = '\n'
item.set('value', (row[name]))

tree.write('out.xml')
Patrice T 15-Jun-21 4:24am    
Use Improve question to update your question.
So that everyone can pay attention to this information.
[no name] 16-Jun-21 1:58am    
I meant, what comes "before" the XML? You say "to modify XML earlier". If the file is "defective", fix the problem where the XML is created. Usually that's easier.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900