TAGS :Viewed: 1 - Published at: a few seconds ago

[ Having trouble using BeautifulSoup to parse WeatherUnderground ]

I'm trying to adapt a code to extract information from wunderground. However the script that I'm trying to adapt was written in 2008 and the formating on weather underground has changed. I'm having trouble with soup.body.nobr.b.string. I want to extract daily percipitation data from a given site. http://www.wunderground.com/history/airport/KBUF/2011/5/2/DailyHistory.html import urllib2 from BeautifulSoup import BeautifulSoup

# Create/open a file called wunder.txt (which will be a comma-delimited file)
f = open('wunder-data.txt', 'w')

# Iterate through year, month, and day
for y in range(1980, 2007):
  for m in range(1, 13):
    for d in range(1, 32):

      # Check if leap year
      if y%400 == 0:
        leap = True
      elif y%100 == 0:
        leap = False
      elif y%4 == 0:
        leap = True
      else:
        leap = False

      # Check if already gone through month
      if (m == 2 and leap and d > 29):
        continue
      elif (m == 2 and d > 28):
        continue
      elif (m in [4, 6, 9, 10] and d > 30):
        continue

      # Open wunderground.com url
      url = "http://www.wunderground.com/history/airport/KBUF/"+str(y)+ "/" + str(m) + "/" + str(d) + "/DailyHistory.html"
      page = urllib2.urlopen(url)

      # Get temperature from page
      soup = BeautifulSoup(page)
      dayTemp = soup.body.nobr.b.string

      # Format month for timestamp
      if len(str(m)) < 2:
        mStamp = '0' + str(m)
      else:
        mStamp = str(m)

      # Format day for timestamp
      if len(str(d)) < 2:
        dStamp = '0' + str(d)
      else:
        dStamp = str(d)

      # Build timestamp
      timestamp = str(y) + mStamp + dStamp

      # Write timestamp and temperature to file
      f.write(timestamp + ',' + dayTemp + '\n')

# Done getting data! Close file.
f.close()

Answer 1


Don't mess with parsing the HTML, it'll likely change again without notice. Get one of their CSV files (there are links at the bottom of the HTML pages), and parse it with the csv module.