BeautifulSoup 善于网页数据分析 。可是 python for android : BeautifulSoup 有 bug ,
text = h4.a.text 仅仅能取得 None,因此我写了function: getText() 来fix this bug.
比如: 抓取CSDN极客头条内容 soup.py
import urllib2, refrom BeautifulSoup import BeautifulSoupimport sysreload(sys)sys.setdefaultencoding('utf-8')def getText(text): begin = text.find('>',0) if begin > -1: begin += 1 end = text.find('',begin) if begin < end: return text[begin:end].strip() else: return None else: return Nonepage = urllib2.urlopen("http://geek.csdn.net/new")soup = BeautifulSoup(page)for h4 in soup.findAll('h4'): if h4.a is not None: href = h4.a.get('href') text = getText(str(h4.a)) print text print hrefpage.close()请參考: