TAGS :Viewed: 21 - Published at: a few seconds ago

[ Remove line break inside line row from CSV with regular expression ]

Hello I have this text :

1,0.00,,2.00,10,"Block. CertNot Valid.
Query with me",2013-06-20,0,0.00

This is two lines in CSV file, but really is one line of data and I want remove the break line, and put this line in just one line using Regular Expressions.

I've tried: (\")(.*)(\n)(.*)(\") , but it doesn't work.

Answer 1

Don't. There is no need to remove the line break.

Use the csv module to read the CSV file, it'll handle the linebreak correctly:

import csv

with open(csvfilename, 'rb') as infile:
    reader = csv.reader(infile)
    for row in reader:
        print repr(row[5])

will print:

'Block. CertNot Valid.\nQuery with me'

for that row.

This works because that column is correctly quoted.

Answer 2

You can check result here: https://www.debuggex.com/r/2_X5N-wTLZ2laJKh

Console output:

>>> regex = re.compile("\"(.+?)\"",re.MULTILINE|re.DOTALL|re.VERBOSE)
>>> regex.findall(string)
[u'Block. CertNot Valid.\nQuery with me', u'test\naaa', u'bbb\nvvvv']

And 'string' value is:

1,0.00,,2.00,10,"Block. CertNot Valid.
Query with me",2013-06-20,0,0.00