100일 챌린지/빅데이터기반 인공지능 융합 서비스 개발자

Day 89 - web scraping (2)

ksyke 2024. 12. 4. 10:09

목차

    https://www.crummy.com/software/BeautifulSoup/bs4/doc/

     

    Beautiful Soup Documentation — Beautiful Soup 4.12.0 documentation

    Beautiful Soup Documentation Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers h

    www.crummy.com

     

    import requests
    from bs4 import BeautifulSoup
    # requests.get(url1).content # binary string
    msg=requests.get(url1).text
    soup=BeautifulSoup(msg,'html.parser')
    
    # soup.find_all('section',id='kinds-of-objects')
    # list(soup.find(class_='body').children)
    # soup.css.select_one('.body h2').next_element.get_text
    arr1=[]
    for h1 in soup.select('.body h1'):
        arr1.append(h1.get_text())
    print(arr1)

     

    soup.css.select_one('.body').css.select_one('h1').next_sibling.next_sibling

     

    arr2=[]
    for h1 in soup.css.select_one('.body').css.select('h1'):
        # arr2.append(h1.next_sibling.next_sibling)
        arr2.append(h1.next_element.get_text())
    arr2