I am working on a flow to read DMARC XML files, collect specific data, and export into a CSV file. I am able to take all the files from a folder, add them to a list, then run the flow scraping and exporting the data to CSV.
I found that some of the files have more than 1 field called "row". Most have one, some have 2-5. There are a few though that have 40+. I was originally going to just duplicate what I have working already, but to do this 50+ times is just too much. Does anyone have any tips on a setup that would be able to determine how many entries for this field there is in the file, then pull this as many times as needed?
Folder.GetFiles Folder: $'''NETWORKSHARE\\IT\\DMARC\\emailserver''' FileFilter: $'''*.xml''' IncludeSubfolders: False FailOnAccessDenied: True SortBy1: Folder.SortBy.NoSort SortDescending1: False SortBy2: Folder.SortBy.NoSort SortDescending2: False SortBy3: Folder.SortBy.NoSort SortDescending3: False Files=> xml_files
LOOP FOREACH CurrentItem IN xml_files
XML.ReadFromFile File: CurrentItem Encoding: XML.FileEncoding.DefaultEncoding XmlDocument=> XmlDocument
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/report_metadata/org_name''' TextValue=> org_name
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/report_metadata/email''' TextValue=> email
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/report_metadata/report_id''' TextValue=> report_id
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/policy_published/domain''' TextValue=> domain
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record/row/source_ip''' TextValue=> source_ip
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record/row/count''' TextValue=> count
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record/row/policy_evaluated/disposition''' TextValue=> disposition
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record/row/policy_evaluated/dkim''' TextValue=> dkim
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record/row/policy_evaluated/spf''' TextValue=> spf
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record/identifiers/header_from''' TextValue=> header_from
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record/auth_results/spf/domain''' TextValue=> spf_domain
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record/auth_results/spf/result''' TextValue=> spf_result
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record[2]/row/source_ip''' TextValue=> source_ip2
ON ERROR
END
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record[2]/row/count''' TextValue=> count2
ON ERROR
END
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record[2]/row/policy_evaluated/disposition''' TextValue=> disposition2
ON ERROR
END
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record[2]/row/policy_evaluated/dkim''' TextValue=> dkim2
ON ERROR
END
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record[2]/row/policy_evaluated/spf''' TextValue=> spf2
ON ERROR
END
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record[2]/identifiers/header_from''' TextValue=> header_from2
ON ERROR
END
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record[2]/auth_results/spf/domain''' TextValue=> spf_domain2
ON ERROR
END
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''/feedback/record[2]/auth_results/spf/result''' TextValue=> spf_result2
ON ERROR
END
Variables.CreateNewDatatable InputTable: { ^['Org Name', 'E-Mail', 'Report ID', 'Domain', 'Source IP', 'Count', 'Disposition', 'Dkim', 'SPF', 'From', 'SPF Domain', 'Result'], [org_name, email, report_id, domain, source_ip, count, disposition, dkim, spf, header_from, spf_domain, spf_result], [org_name, email, report_id, domain, source_ip2, count2, disposition2, dkim2, spf2, header_from2, spf_domain2, spf_result2] } DataTable=> DataTable
File.WriteToCSVFile.WriteCSV VariableToWrite: DataTable CSVFile: $'''NETWORKSHARE\\IT\\DMARC\\Master.csv''' CsvFileEncoding: File.CSVEncoding.SystemDefault IncludeColumnNames: True IfFileExists: File.IfFileExists.Append ColumnsSeparator: File.CSVColumnsSeparator.SystemDefault
END
Hi,
I could not visualise the exact issue you are referring to as there is no sample xml structure, but I am assuming that you are trying to say that there are multiple nodes called "row".
One or more nodes can be retrieved using Xpath.
Scroll down in the above link and find the correct Xpath query for the row nodes and give the same in the PAD action.
I misspoke. The field name is "record" not "row".
I cleaned up my flow and replaced the full paths with the more generalized version so that '''/feedback/record/row/policy_evaluated/disposition''' is now "//disposition''. I think my issue is that I am not writing the values to a list and the list should then be pushed to the csv, but i don't think it works that way. am I close?
Flow:
Folder.GetFiles Folder: $'''Z:\\IT\\DMARC\\emailserver''' FileFilter: $'''*.xml''' IncludeSubfolders: False FailOnAccessDenied: True SortBy1: Folder.SortBy.NoSort SortDescending1: False SortBy2: Folder.SortBy.NoSort SortDescending2: False SortBy3: Folder.SortBy.NoSort SortDescending3: False Files=> xml_files
LOOP FOREACH CurrentItem IN xml_files
XML.ReadFromFile File: CurrentItem Encoding: XML.FileEncoding.DefaultEncoding XmlDocument=> XmlDocument
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''//org_name''' TextValue=> org_name
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''//email''' TextValue=> email
XML.GetXmlElementValue.GetElementValueAsNumeric Document: XmlDocument XPathQuery: $'''//report_id''' NumericValue=> report_id
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''//domain''' TextValue=> domain
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''//source_ip''' TextValue=> source_ip
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''//count''' TextValue=> count
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''//disposition''' TextValue=> disposition
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''//dkim''' TextValue=> dkim
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''//spf''' TextValue=> spf
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''//header_from''' TextValue=> header_from
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''//domain''' TextValue=> spf_domain
XML.GetXmlElementValue.GetElementValue Document: XmlDocument XPathQuery: $'''//result''' TextValue=> spf_result
Variables.CreateNewDatatable InputTable: { ^['Org Name', 'E-Mail', 'Report ID', 'Domain', 'Source IP', 'Count', 'Disposition', 'Dkim', 'SPF', 'From', 'SPF Domain', 'Result'], [org_name, email, report_id, domain, source_ip, count, disposition, dkim, spf, header_from, spf_domain, spf_result] } DataTable=> DataTable
File.WriteToCSVFile.WriteCSV VariableToWrite: DataTable CSVFile: $'''Z:\\IT\\DMARC\\Master.csv''' CsvFileEncoding: File.CSVEncoding.SystemDefault IncludeColumnNames: True IfFileExists: File.IfFileExists.Append ColumnsSeparator: File.CSVColumnsSeparator.SystemDefault
END
XML:
<?xml version="1.0" ?>
- <feedback>
- <report_metadata>
<org_name>Yahoo</org_name>
<email>dmarchelp@yahooinc.com</email>
<report_id>166683.334100</report_id>
- <date_range>
<begin>1666742400</begin>
<end>1666828799</end>
</date_range>
</report_metadata>
- <policy_published>
<domain>REMOVED.com</domain>
<adkim>r</adkim>
<aspf>r</aspf>
<p>reject</p>
<pct>100</pct>
</policy_published>
- <record>
- <row>
<source_ip>166.x.x.x</source_ip>
<count>1</count>
- <policy_evaluated>
<disposition>reject</disposition>
<dkim>fail</dkim>
<spf>fail</spf>
</policy_evaluated>
</row>
- <identifiers>
<header_from>REMOVED.com</header_from>
</identifiers>
- <auth_results>
- <spf>
<domain>REMOVED.com</domain>
<result>fail</result>
</spf>
</auth_results>
</record>
- <record>
- <row>
<source_ip>208.x.x.x</source_ip>
<count>41</count>
- <policy_evaluated>
<disposition>reject</disposition>
<dkim>fail</dkim>
<spf>fail</spf>
</policy_evaluated>
</row>
- <identifiers>
<header_from>REMOVED.com</header_from>
</identifiers>
- <auth_results>
- <spf>
<domain>REMOVED.com</domain>
<result>fail</result>
</spf>
</auth_results>
</record>
- <record>
- <row>
<source_ip>40.x.x.x</source_ip>
<count>1</count>
- <policy_evaluated>
<disposition>none</disposition>
<dkim>pass</dkim>
<spf>pass</spf>
</policy_evaluated>
</row>
- <identifiers>
<header_from>REMOVED.com</header_from>
</identifiers>
- <auth_results>
- <dkim>
<domain>REMOVED.com</domain>
<selector>selector1</selector>
<result>pass</result>
</dkim>
- <spf>
<domain>REMOVED.com</domain>
<result>pass</result>
</spf>
</auth_results>
</record>
- <record>
- <row>
<source_ip>47.x.x.x</source_ip>
<count>1</count>
- <policy_evaluated>
<disposition>reject</disposition>
<dkim>fail</dkim>
<spf>fail</spf>
</policy_evaluated>
</row>
- <identifiers>
<header_from>REMOVED.com</header_from>
</identifiers>
- <auth_results>
- <spf>
<domain>REMOVED.com</domain>
<result>fail</result>
</spf>
</auth_results>
</record>
</feedback>
XML