Hi, I have started learning programming for my research purpose. I am having bit of trouble working with XML files since I an new to it. Well my question is, How to match an attribute value of node element and update duplicate node with sub-element in XML file?
Input XML file:
<brand by="rtrtrtrt" date="2014/01/01" name="OOP-112200" insti="TGA">
<exp name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
<elem name="abcd" sound_freq="abcd" c_ty="pv">
<feature number="48">
<tfgt v="0.1466469683747654" y="0.0" units="elem">
<swan sound_freq="abcd" first_name="g7tty" description="xyz">
<elem tension="SGCGGSCGSC" name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="elem">
<swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
<act db="Acc" value="PM_1234555|ta">
<act db="pval" value="0.1">
<act db="xyz" value="abc">
<per fre="Volum_5mb" value="89.00">
<per fre="Volum_40mb" value="44.00">
<per fre="Volum_70mb" value="77.00">
<elem tension="SGCGSCGSCGSCGSC" name="M_20_K40745172" sound_freq="mhr17:7907527-7907100"
="" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="elem">
<swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
<act db="Acc" value="PR_0987677|wa">
<act db="pval" value="0.99">
<act db="xyz" value="abc">
<per fre="Volum_5mb" value="99.00">
<per fre="Volum_40mb" value="57.00">
<per fre="Volum_70mb" value="88.00">
<elem tension="SGCGSCGSCGSCGSC" name="M_20_K40745172" sound_freq="mhr17:7907527-7907100"
="" s_c="0">
<feature number="57477">
<tfgt v="0.1466469683747654" y="0.39999" units="elem">
<swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
I want to find duplicate node by matching element elem name value in file and update duplicate node with sub-element in XML file. (for example duplicate in this set 'M_20_K40745172')
I am expecting :
<brand by="hhdhdh" date="2014/01/01" name="OOP-112200" insti="TGA">
<exp name="OOP-112200" own="TGA" descri="" sound_db="JJKO">
<elem name="abcd" sound_freq="abcd" c_ty="pv">
<feature number="48">
<tfgt v="0.1466469683747654" y="0.0" units="elem">
<swan sound_freq="abcd" first_name="g7tty" description="xyz">
<elem tension="SGCGGSCGSC" name="M_20_K40745170" sound_freq="mhr17:7907527-7907589" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="elem">
<swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
<act db="Acc" value="PM_1234555|ta">
<act db="pval" value="0.1">
<act db="xyz" value="abc">
<per fre="Volum_5mb" value="89.00">
<per fre="Volum_40mb" value="44.00">
<per fre="Volum_70mb" value="77.00">
<elem tension="SGCGSCGSCGSCGSC" name="M_20_K40745172" sound_freq="mhr17:7907527-7907100"
="" s_c="0">
<feature number="5748">
<tfgt v="0.1466469683747654" y="0.0" units="elem">
<swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
<act db="Acc" value="PR_0987677|wa">
<act db="pval" value="0.99">
<act db="xyz" value="abc">
<per fre="Volum_5mb" value="99.00">
<per fre="Volum_40mb" value="57.00">
<per fre="Volum_70mb" value="88.00">
<elem tension="SGCGSCGSCGSCGSC" name="M_20_K40745172" sound_freq="mhr17:7907527-7907100"
="" s_c="0">
<feature number="57477">
<tfgt v="0.1466469683747654" y="0.39999" units="elem">
<swan sound_freq="mhr17:7907527-7907589" first_name="g7tty" description="xyz">
<act db="Acc" value="PR_0987677|wa">
<act db="pval" value="0.99">
<act db="xyz" value="abc">
<per fre="Volum_5mb" value="99.00">
<per fre="Volum_40mb" value="57.00">
<per fre="Volum_70mb" value="88.00">
Any help would be appreciated. Thanks in advance!
What I have tried:
Right now I am studying XML parsing module like etree to solve this
I use pandas data frame to modify XML earlier, but not able to modify duplicates. I dont want to ingore duplicates.
df:
id pval Acc
1 M_20_K40745170 0.1 PM_1234555|ta
2 M_20_K40745172 0.99 PR_0987677|wa
Here's my script :
import xml.etree.ElementTree as et
tree = et.parse('input_xml.xml')
root = tree.getroot()
for index, row in df.iterrows():
node = root.find('./brand/sec[@name="{}"]//mwan'.format(row['id']))
print(node)
for name in ['acc','pval']:
item = node.find('./per[@name="{}"]'.format(name))
if not item:
item = et.SubElement(node, 'per')
item.tail = '\n'
item.set('fre', name)
item.tail = '\n'
item.set('value', (row[name]))
tree.write('out.xml')