[ Having trouble using BeautifulSoup to parse WeatherUnderground ]
I'm trying to adapt a code to extract information from wunderground. However the script that I'm trying to adapt was written in 2008 and the formating on weather underground has changed. I'm having trouble with soup.body.nobr.b.string. I want to extract daily percipitation data from a given site. http://www.wunderground.com/history/airport/KBUF/2011/5/2/DailyHistory.html import urllib2 from BeautifulSoup import BeautifulSoup
# Create/open a file called wunder.txt (which will be a comma-delimited file)
f = open('wunder-data.txt', 'w')
# Iterate through year, month, and day
for y in range(1980, 2007):
for m in range(1, 13):
for d in range(1, 32):
# Check if leap year
if y%400 == 0:
leap = True
elif y%100 == 0:
leap = False
elif y%4 == 0:
leap = True
else:
leap = False
# Check if already gone through month
if (m == 2 and leap and d > 29):
continue
elif (m == 2 and d > 28):
continue
elif (m in [4, 6, 9, 10] and d > 30):
continue
# Open wunderground.com url
url = "http://www.wunderground.com/history/airport/KBUF/"+str(y)+ "/" + str(m) + "/" + str(d) + "/DailyHistory.html"
page = urllib2.urlopen(url)
# Get temperature from page
soup = BeautifulSoup(page)
dayTemp = soup.body.nobr.b.string
# Format month for timestamp
if len(str(m)) < 2:
mStamp = '0' + str(m)
else:
mStamp = str(m)
# Format day for timestamp
if len(str(d)) < 2:
dStamp = '0' + str(d)
else:
dStamp = str(d)
# Build timestamp
timestamp = str(y) + mStamp + dStamp
# Write timestamp and temperature to file
f.write(timestamp + ',' + dayTemp + '\n')
# Done getting data! Close file.
f.close()
Answer 1
Don't mess with parsing the HTML, it'll likely change again without notice. Get one of their CSV files (there are links at the bottom of the HTML pages), and parse it with the csv module.