크롤링 리스트 가져오기

2023/근복

notty 2023. 10. 30. 13:34

728x90

import requests

from bs4 import BeautifulSoup

import pandas as pd

# Step 1: Send an HTTP GET request to the URL

pg_num = 66

data = []

for i in range(1,pg_num+1):

response = requests.get(url)

# Step 2: Parse the HTML content of the page with BeautifulSoup

soup = BeautifulSoup(response.content, 'html.parser')

# Step 3: Locate the table and iterate through rows

table = soup.find('table', {'class': 'table-list table-case'}) # replace 'your_table_class' with the actual class name of the table

rows = table.find_all('tr')[1:] # assuming the first row is the header

# Step 4: Extract the desired data from each row

for row in rows:

cols = row.find_all('td')

cols = [elem.text.strip() for elem in cols]

data.append(cols)

# Step 5: Create a DataFrame

df = pd.DataFrame(data, columns=['연번', '신청질병 내용', '심의결과', '심의연도', '주문', '청구취지', '신청내용', '신청인주장', '진료기록 및 의학적 소견', '인정사실', '관계법령', '위원회 판단 및 결론'])

#버튼 누르기

# driver.find_elements(By.CLASS_NAME,'btn-badge')[0].click()

# Step 6: Save the DataFrame to a file

# df.to_csv('output.csv', index=False)

728x90

250x250

notty

개발자, pandas기초, 딥러닝, 통계, 이분탐색, DP, Algorithm, 벡터db, Pinecone, 파이썬, 다항식회귀, kaggle learn, 알고리즘, 그래프, chunksize, pandas, 파이토치, 인공지능, 위키북스, 통계학습,

notty