TAGS :Viewed: 18 - Published at: a few seconds ago

[ Python program to convert excel file to csv, issue in date column from excel ]

I am new to python and uising below code to convert excel file to csv

Code is

#!/bin/env python
import xlrd
import csv
from os import sys

def csv_from_excel(file1):
    workbook = xlrd.open_workbook(file1)
    worksheet = workbook.sheet_by_name('sheet1')
    csv1 = open('test.csv', 'wb')
    wr = csv.writer(csv1,quoting=csv.QUOTE_ALL)

    for rownum in xrange(worksheet.nrows):
        wr.writerow([unicode(entry).encode("utf-8") for entry in worksheet.row_values(rownum)])
    csv1.close()


if __name__ == "__main__":
    csv_from_excel(sys.argv[1])

But column from excel with below values

Case    Code    Date    Amount
5428165773  UA02    4/23/2014    $(1,626.00)

showing as

'Case','Code','Date','Amount'
'5428165773','UA02',,'41752.0','-1626.0'

I also tried adding this but it didn't helped

dialect='excel', quotechar="'"

Answer 1


Excel uses a floating point number that represents the amount of days since a fixed date. You can use the datetime module to calculate the date and create a string.

import datetime

exceldate = datetime.date(1899, 12, 30)

d = exceldate + datetime.timedelta(days=41752)

print d

datetime.date(2014, 4, 23)

new_date = '{}/{}/{}'.format(d.month, d.day, d.year)    

Answer 2


If you already installed pandas module, These code will read excel file and store as dataframe

import pandas as pd
xls = read_excel('path_to_file.xls')

Then

xls.to_csv('path_to_csv.csv')

will write dataframe into CSV

you can read more about this in,

http://pandas.pydata.org/pandas-docs/version/0.15.0/io.html#io-excel

http://pandas.pydata.org/pandas-docs/version/0.15.0/io.html#io-store-in-csv

Answer 3


I think the following function is what you need and it also deals with datetime.time:

def xldate_to_python_date(value):
    """
    convert xl date to python date
    """
    date_tuple = xlrd.xldate_as_tuple(value, 0)
    ret = None
    if date_tuple == (0, 0, 0, 0, 0, 0):
        ret = datetime.datetime(1900, 1, 1, 0, 0, 0)
    elif date_tuple[0:3] == (0, 0, 0):
        ret = datetime.time(date_tuple[3],
                            date_tuple[4],
                            date_tuple[5])
    elif date_tuple[3:6] == (0, 0, 0):
        ret = datetime.date(date_tuple[0],
                            date_tuple[1],
                            date_tuple[2])
    return ret

Here is the documentation of xldate_as_tuple. The above function is referenced from here

And by the way, your csv_from_excel function can be rewritten as the following if you use my library pyexcel:

import pyexcel

def csv_from_excel(file1):
    excel_file = pyexcel.Reader(file1)
    csv_file = pyexcel.Writer("test.csv")
    csv_file.write_reader(excel_file)
    csv_file.close()

More documentation can be found on read-the-docs

Answer 4


yr, mth, dy, hr, min, sec =xlrd.xldate_as_tuple(entry, 0)

this is solving my problem

Thanks everyone and first of all thanks to Alex for very useful direction , right in first attempt

Updates: My code looks like this now, but I am stuck at writing row level output

#!/bin/env python
import xlrd
import csv
from os import sys

def csv_from_excel(file1):
    workbook = xlrd.open_workbook(file1)
    worksheet = workbook.sheet_by_name('sheet1')
    csv1 = open('test.csv', 'wb')
    wr = csv.writer(csv1,quoting=csv.QUOTE_ALL)
    for rownum in xrange(worksheet.nrows):
        if rownum > 2:
            i=0
            for entry in worksheet.row_values(rownum):
                i=i+1
                if i==3:
                    yr, mnth, dy, hr, min, sec =xlrd.xldate_as_tuple(entry, 0)
                    print str(mnth)+'/'+str(dy)+'/'+str(yr)
                    #wr.writerow(str(mnth)+'/'+str(dy)+'/'+str(yr))
                else:
                    print entry
                    #wr.writerow(unicode(entry).encode("utf-8"))
    your_csv_file.close()

if __name__ == "__main__":
    csv_from_excel(sys.argv[1])

Current output 5428165773 UA02 4/23/2014 -1626.0

You would understand i need above output as 5428165773,UA02,4/23/2014,-1626.0

Please comment

Update: This problem solved as well by using print in for loops, insteas of writerow, since it expects whole row

Thanks