cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Paul-Prime92
New Member

Split string using comma delimiter but ignore text within quotation marks? (line by line / Cloud Flow)

Hi all,

 

I've got a CSV file I'm trying to work through line by line by converting it to an array and then pass the data through to our database if it meets certain criteria but frustratingly, all of the lines have quote marks as they're supposed to be strings and they all have commas in which is causing me havoc. Please see below for a test case of the data.

 

 

[
"AAB,\"X, Y, Z, XX, YY, ZZ\",HK49,8501943,?,black,ink,,305,TBD-RD40,1.99,395",
"ABC,,ACR110,900438,???,colour,ink,,313,TBA-HK19,1.99,401",
]

 

This is my information after it's been loaded up initially and then when I apply by each and go through it line by line using a split ( @{split(item(), ',')} ) I'm getting the below output.

 

 

[
"AAB",
"\"X",
"Y",
"Z",
"XX",
"YY", 
"ZZ\"",
"HK49",
"8501943",
"?",
"black",
"ink",
"",
"305",
"TBD-RD40",
"1.99",
"395"
]

 

 

This is causing me so many problems as I can't figure out a way to account for the text within the quote marks and simply ignore it. I'm trying to achieve the following output but I've got absolutely no idea how to go about it:

 

 

[
"AAB",
"\"X, Y, Z, XX, YY, ZZ\"",
"HK49",
"8501943",
"?",
"black",
"ink",
"",
"305",
"TBD-RD40",
"1.99",
"395"
]

 

I would be extremely appreciative if I could have some input on this as I've been scratching my head since yesterday trying to think of ways around this.

 

If there's anything I can do to assist I would be more than willing (within the realms of data security for my company)

3 REPLIES 3
eliotcole
Super User
Super User

Hi, @Paul-Prime92, this has actually come up a fair bit, matey. Take a look at the 'Recommendations' on the right there, and do a couple of searches in this forum on: CSV array

 

You're bound to find the answer you need.

 

In the meantime, I'll take a pop, as I might have a solution for this built already.

 

Suffice to say, maybe try to start with the headers, and see where you progress from there. I am assuming that you're wanting to make an object for each row with the headers as key names, and the respective column values as ... values. Starting at the top will help you build your logic.

 

Also, I assume that to get your current array of items all on one line, you're splitting your data on new lines, right?

 

Oh! and please edit into you original question either the source CSV data (how it looks in the CSV) or an example set of it.

 

Some CSVs are formatted with speech marks for every column item (which is wise for additional commas), so it may also be a thing.

 

Lastly, if you're OK with the privacy element (I'm pretty sure it's fine) then the Parse CSV action in the Plumsail connector is super useful here. 👍

takolota
Super User
Super User

eliotcole
Super User
Super User

Hi, @Paul-Prime92, hopefully you found a solution, but if not, the following may suffice. You can also add a third Select to further mess with this using xml() & XPath() commands, however this will allow you to iterate over each unknown column.

 

I made a bit more sample data, just to ensure it caters for any speech-marks inside a cell, and that's in this spoiler.

Spoiler
AAB,"X, Y, Z, XX, YY, ZZ",HK49,8501943,?,black,ink,,305,TBD-RD40,1.99,395
ABC,"X, Y, Z, XX, YY, ZZ",1.99,8501944,?,black,ink,,305,TBD-RD40,1.99,395
ABA,"X, Y, ""Z"", XX, YY, ZZ",HK49,8501945,?,black,ink,,305,TBD-RD40,1.89,423
AAB,"X, Y, Z, XX, YY, ZZ",HK49,8501946,?,black,ink,,305,TBD-RD40,1.99,395

Right ... onward!

 

Two Selects and a Carriage Return

This will work as long as the only CSV formatting that is done is "" for double-quotes in a cell, and surrounding cells with commas in them in double quotes.

0 - The Flow.jpg

There's some long expressions in here, but they're not too scary, just some replace() actions and silly names I made up. 😉

 

cCNST

This is just a carriage return in a Compose action. It allows you to quickly reference new lines in strings throughout.

 

Select reDelimiting

This basically makes a mess. 😅

02 - Select reDelimiting.jpg
From:
split(
    replace(
        replace(
            replace(
                replace(
                    outputs('csvCNST'), 
                    '""', 
                    'dubQu0t£y'
                ), 
                ',"', 
                'DEL1MI73oSt@rtyWartF@acE'
            ), 
            '",', 
            'DEL1MI73o'
        ), 
        outputs('cCNST'), 
        'n3wR3c0rdI@'
    ), 
    'DEL1MI73o'
)
Map:
if(
    startsWith(
        item(), 
        'St@rtyWartF@acE'
    ), 
    replace(
        replace(
            item(), 
            ',', 
            'c00m@rR$'
        ), 
        'St@rtyWartF@acE', 
        ''
    ), 
    item()
)

Essentially this Select action exists to 'encode' a lot of stuff into something that won't ever be a string.

 

Explaining the From inside-out:

Spoiler
  1. replace() #1 - Encodes any speech marks internal to a cell to dubQu0t£y.
    Now there should only be speech marks where formatted cells start and end.

  2. replace() #2 - Encodes the starts of cells beginning with speech marks to DEL1MI73oSt@rtyWartF@acE.
    Now there should only be speech marks where formatted cells end.

  3. replace() #3 - Encodes the ends of cells ending with speech marks to DEL1MI73o.
    Now there should be no speech marks.

  4. replace() #4 - Encodes any new lines as n3wR3c0rdI@.
    Now the whole thing is just one line with lots of commas and an occasional weird coded word!

  5. split() - Makes an array wherever there is DEL1MI73o.
    Now there are distinct items which begin with St@rtyWartF@acE.

Explaining the Map, from inside the if() statement:

Spoiler
  1. startsWith() - This true/false check looks to see if the current item() which the Select is processing begins with St@rtyWartF@acE.

  2. true - If the item() starts with St@rtyWartF@acE it will remove it, and encode any commas.
    1. replace() #1 - Encodes any commas as c00m@rR$.
    2. replace() #2 - Removes St@rtyWartF@acE.

  3. false - If the item() doesn't start with that silly word, then this item() returns as normal.

 

Select arrayItems

This fixes the mess! 🤓

03 - Select itemArrays.jpg
From:
split(
    join(
        body('Select_reDelimiting'), 
        ','
    ), 
    'n3wR3c0rdI@'
)
Map:
json(
    concat(
        '["', 
        replace(
            replace(
                replace(
                    item(), 
                    ',', 
                    '","'
                ), 
                'c00m@rR$', 
                ','
            ), 
            'dubQu0t£y', 
            '\"'
        ), 
        '"]'
    )
)

Here the Select action exists to make an array out of the data on each line by using the json() converter function. This ensures that before it does so that it finally runs replace() functions on the previously placed commas and speech marks.

 

Explaining the From inside-out:

Spoiler
  1. join() - Joins all the items made by the previous Select action together with a comma.
    Now there should be a single line of text again, with some funny words.

  2. split() - Finds every instance where the previous Select had encoded a new record (new line) into and makes a new array entry.
    Now there should only array items for each line of the original CSV except the commas and double quotes from speech marked cells are still encoded.

Explaining the Map inside-out:

Spoiler
  1. replace() #1 - Swaps any commas for a comma between two speech marks.

  2. replace() #2 - Decodes any c00m@rR$ into commas.

  3. replace() #3 - Decodes any to dubQu0t£y into an escaped speech mark (\").
    Now this particular item() should be a single line of text, with values separated by ",".

  4. replace() #4 - Encodes any new lines as n3wR3c0rdI@.
    Now the whole item() is just one line with lots of commas and an occasional weird coded word!

  5. concat() - This places a [" at the start and a "] at the end of this line.
    Now the item() is an array, but it is still in string format.

  6. json() - Converts the whole thing into acceptible JSON which can be read by subsequent actions.

 

Resultant Data

In the below spoiler is what the data that came out of my flow when I ran this.

Spoiler
[
  [
    "AAB",
    "X, Y, Z, XX, YY, ZZ",
    "HK49",
    "8501943",
    "?",
    "black",
    "ink",
    "",
    "305",
    "TBD-RD40",
    "1.99",
    "395"
  ],
  [
    "ABC",
    "X, Y, Z, XX, YY, ZZ",
    "1.99",
    "8501944",
    "?",
    "black",
    "ink",
    "",
    "305",
    "TBD-RD40",
    "1.99",
    "395"
  ],
  [
    "ABA",
    "X, Y, \"Z\", XX, YY, ZZ",
    "HK49",
    "8501945",
    "?",
    "black",
    "ink",
    "",
    "305",
    "TBD-RD40",
    "1.89",
    "423"
  ],
  [
    "AAB",
    "X, Y, Z, XX, YY, ZZ",
    "HK49",
    "8501946",
    "?",
    "black",
    "ink",
    "",
    "305",
    "TBD-RD40",
    "1.99",
    "395"
  ]
]

 

Helpful resources

Announcements
 WHAT’S NEXT AT MICROSOFT IGNITE 2022

WHAT’S NEXT AT MICROSOFT IGNITE 2022

Explore the latest innovations, learn from product experts and partners, level up your skillset, and create connections from around the world.

Register for a Free Workshop.png

Register for a Free Workshop

Learn to digitize and optimize business processes and connect all your applications to share data in real time.

Users online (1,706)