Hello, World!Tuesday 27th of September 2022 05:57:19 AM

Python 爬虫、数据清洗及可视化

基础概念


数据清洗实例


---获取中国日报中文网要闻列表,代码如下---

import requests
import soupsieve
from bs4 import BeautifulSoup

url = "http://cn.chinadaily.com.cn/"
contenthtml = requests.get(url)
contenthtml.encoding = "uft-8"
soup = BeautifulSoup(contenthtml.text, 'lxml')
data = soup.select('body > div.pc > div.container > div.container-right > div.yaowen > div.right-lei > ul > li > a')

for item in data:
    result = {
        'title': item.get_text(),
        'link': 'https:' + item.get('href')
    }
    print(result['title'])
    print(result['link'])


China Daily News

Unfair game
Hallmark of friendship
Open and pragmatic partnership for comprehensive cooperation
IMF move shows greater acceptance of renminbi
Making the Statue of Liberty Cry
Anti-trust probe key to fair flow of knowledge
Growing clamor in Japan for nuclear sharing alarming
Self-exposed culpability in the smearing of China
Toll of shame and political dysfunction: China Daily editorial
Sustainability key to long-term development of capital market
Building disaster-resilient cities necessary
Pressure on businesses expected to gradually ease
World needs less saber-rattling, more efforts to restore peace
Ukraine crisis puts Europe's autonomy at stake
Strict pandemic control measures not contrary to market economy
US can't force ASEAN to help contain China
'One country, two systems' best for HK
Addressing virus-related mental illnesses
Learning payoff
Better regulation key to healthy development of pet industry
New ROK leader brings changes and chances
China still attractive for foreign capital
Non-exclusion zone
By blaming HK, EU politicians show ignorance
New system ensures patriots administering HK
Election of next chief executive opens new chapter for HKSAR's development
New chapter begins for Hong Kong's development: China Daily editorial
Should cooking be taught at schools?
'Big white' angels help China fight virus


粤ICP备14038203号