최종데이터 처리

2023/물사랑 나라사랑

최종데이터 처리

notty 2023. 9. 24. 22:24

728x90

오늘 할 일

-데이터 행별로 보면서 정상범위 밖으로 벗어나는 컬럼 삭제

~~침전수 탁도가 1NTU이상인 경우가 있음 -> 삭제필요~~
~~정수지 탁도가 800, 400에 찍혀있다니 -> 삭제필요(1이상 모두 삭제)~~
~~정수지 탁도가 침전수보다 큰 경우 -> 삭제필요~~
원수 알칼리도 20 이하값들 100이상 -> 보류
원수 전기전도도 0인 값 -> 보류
원수 유입유량 1000이하
- 6065     2013/09/11 05:00
  6488     2013/09/28 21:00
  22843    2015/08/16 14:00
  22844    2015/08/16 15:00
  31314    2016/08/05 15:00
  51027    2019/10/17 15:00
  52560    2019/12/20 13:00
  60508    2020/11/17 15:00
원수 ph 5이하, 9이

-년도별, 계절별, 월별, 일별 그래프 찍어보기 - 모든 변수에 대해서

-의미있는 정보 최소 3개 찾아내보기 (가능하냐? ㅇ ㄱㄴ)

-강우량 말고 미세먼지데이터

-본포 취수장 근처의 강우데이터 찾고 다운로드 받기

# 24:00형식을 다음날 00:00시로 변경 후 datetime으로 변경
def cvt_24_to_00(ori_time):
    # print(time_str)
    date_str, time_str = str(ori_time).split(" ")

    if time_str == "24:00":
        date_obj = pd.to_datetime(date_str, format='%Y/%m/%d')
        date_obj += pd.Timedelta(days=1)
        return date_obj.strftime('%Y/%m/%d') + " 00:00"
    else:
        return ori_time

# 'logTime' 열에 함수 적용
df_1['logTime_dt'] = df_1['logTime'].apply(cvt_24_to_00)


df_1['logTime_dt'] = pd.to_datetime(df_1['logTime_dt'], format = '%Y/%m/%d %H:%M')

# 년/월/일/시 칼럼 생성
df_1['year'] = (df_1['logTime_dt'].dt.year).astype(object) #연도
df_1['month'] = (df_1['logTime_dt'].dt.month).astype(object) #월
df_1['day'] =( df_1['logTime_dt'].dt.day).astype(object) #일
df_1['hour']=(df_1['logTime_dt'].dt.hour).astype(object) #시

#sns 한글 깨짐
import matplotlib.pyplot as plt

plt.rcParams['font.family'] = 'Malgun Gothic'

import seaborn as sns
import matplotlib.pyplot as plt

#상관관계 값
corr_matrix = df_cols.corr()

#히트맵
# Create a heatmap of the correlation matrix
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap="coolwarm", fmt=".2f")

# Add titles and labels
plt.title("PACS 응집제 사용 Correlation Heatmap")
plt.xlabel("Features")
plt.ylabel("Features")

#데이터 프레임 수직으로 합친다.
df_comb = pd.concat([df_PAC, df_PACS], axis=0)

#lambda 함수를 사용하여 열에 특정 값을 조건에 따라 바꾼다
df_comb['구분'] = df_comb['PAC 주입률'].apply(lambda x: 0 if x  > 0  else 1)

#excel로 저장
df_comb.to_excel('df_comb.xlsx', index=False)

monthly_temp = df.groupby(df['month'])['창원권_반송(정) 원수 pH'].mean()

import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plt.plot(monthly_temp.index, monthly_temp.values, marker='o', linestyle='-')
plt.title('월별 평균 pH')
plt.xlabel('월')
plt.ylabel('평균 pH')
plt.xticks(range(1, 13))  # X 축 레이블을 월로 설정
plt.grid(True)
plt.show()

# 연도별, 월별 PAC_ppm 값의 평균 계산
#추세선
yearly_monthly_pH_avg = df.groupby(['year', 'month'])['PAC_ppm'].mean().reset_index()

# 연도별로 서브플롯을 생성하여 각 연도의 월별 PAC_ppm 값 시각화
years = yearly_monthly_pH_avg['year'].unique()

plt.figure(figsize=(12, 8))

for year in years:
    subset = yearly_monthly_pH_avg[yearly_monthly_pH_avg['year'] == year]
    plt.plot(subset['month'], subset['PAC_ppm'], marker='o', linestyle='-', label=f'연도 {year}')

plt.title('연도별 월별 PAC_ppm 값')
plt.xlabel('월')
plt.ylabel('평균 원수 PAC_ppm 값')
plt.legend()
plt.grid(True)

plt.show()

728x90

'2023 > 물사랑 나라사랑' 카테고리의 다른 글

앙상블 (Ensemble) (0)	2023.10.03
머신러닝 모델위주 (0)	2023.09.11
요약) 강화학습을 이용한 응집제 주입률 최적화 (0)	2023.09.11
EDA 정리 (0)	2023.09.08
연습 데이터) 상수원-취수원 통합 수질 및 녹조 데이터 (0)	2023.09.07

현재글최종데이터 처리

250x250

notty

벡터db, Algorithm, 위키북스, 인공지능, kaggle learn, 통계, 딥러닝, DP, 개발자, 이분탐색, 통계학습, 그래프, pandas기초, 다항식회귀, Pinecone, chunksize, 파이썬, 파이토치, pandas, 알고리즘,

Today :
Yesterday :

notty

최종데이터 처리

'2023 > 물사랑 나라사랑' 카테고리의 다른 글

'2023/물사랑 나라사랑'의 다른글

티스토리툴바

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

최종데이터 처리

'2023 > 물사랑 나라사랑' 카테고리의 다른 글

'2023/물사랑 나라사랑'의 다른글

관련글

티스토리툴바