Hi all...
I have a large data set which has a column of potentially usefull tags. The issue is that I'm struggling to get the contents of the json (inside a single column) out into a form that is useful.
I would like to use the json name fields as headers for the columns, and the attributes as the values in the rows, but wrestling this out is proving hard. The issue is that there are far too many tags (which I am endeavouring to streamline) but the order of their occurence in the json string is totally random.
eg
{ "application":"name" , "type":"type 1","version":"version 2" }
{ "version":"version 3" , "application":"name 1" , "type":"type 2" }
{ "type":"type 1" , "application":"name 1" , "version":"version 2" }
Headers would logically be application, type, version etc... with the row values being stored under each heading type.
Any tips on how to get this done? I think I've exhausted the functions in power query that I am familiar with easily.
If the order of the json was consistent then splitting would be straight forward, but this data is everywhere...
Thanks
Tony
Hi @Tony_Kuiper ,
this order shouldn't matter, ususally. This works for me if those records sit in an array:
let
Source = "[{ ""application"":""name"" , ""type"":""type 1"",""version"":""version 2"" },#(lf){ ""version"":""version 3"" , ""application"":""name 1"" , ""type"":""type 2"" },#(lf){ ""type"":""type 1"" , ""application"":""name 1"" , ""version"":""version 2"" }]",
#"Parsed JSON" = Json.Document(Source),
#"Converted to Table" = Table.FromList(#"Parsed JSON", Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Expanded Column1" = Table.ExpandRecordColumn(#"Converted to Table", "Column1", {"application", "type", "version"}, {"application", "type", "version"})
in
#"Expanded Column1"
I'll have to try adding the brackets to see if it will behave as an array because the default data is not like that.
But how is the data delivered then? In different queries? Different columns?
I get a data file in CSV. There's about 40 columns and 95k rows.
One column has tags that are in the json format. The issue I have is the information I would use as the header doesn't appear in a consistent order. i.e "title": appears in any position from 1 to up to 20 something. If I split by delimiter and want to ensure all values for "title" are found I'll end up with 20 plus more columns. I don't have any real power to change how this data is created (I'm working on it) so I'm looking for a workaround to hunt out all the values tagged as "title" to have them in a single column. I can then discard the columns of little value. Target is about 5 key columns.
Hopefully this makes sense.
I believe that I start to understand. Not sure yet, if the column names are the names in the json records or the field values, so I created solutions for both. Please check the file enclosed.
Basically, you add a column that collects all data from the current row (except the json field) for the first row of a new table and then you determine the column names from the json in a second column. Finally, you create a new table in each row that assembles the values correctly.
Hi @Tony_Kuiper ,
Is your problem solved? If so, would you mind accept the helpful replies as solutions? Then we are able to close the thread. More people who have the same requirment will find the solution quickly and benefit here, thank you!
Best Regards,
Community Support Team _ kalyj
I'll give this a try however my data even though it appears as json and in array form doesn't behave like it when trying to parse it...
I have worked around it so far with a create column that just does a LOT of if / then type checks on the columns. It was a lot to create but it works at least. Also a lot faster running than expected, so not a choke point on the (so far) 360k rows of records...