파이썬 라이브러리를 활용한 머신러닝 (data set)

카테고리 없음

파이썬 라이브러리를 활용한 머신러닝 (data set)

데이터_박과장 2023. 3. 30. 13:48

`sklearn.dataset` 안에는 Toy Dataset 데이터 셋들이 존재합니다.

책에서 다루는 데이터 셋에 대한 이해도를 높이고자 합니다.

load_boston: 보스톤 집값 데이터
load_iris: 아이리스 붓꽃 데이터
load_diabetes: 당뇨병 환자 데이터
load_breast_cancer: 위스콘신 유방암 환자 데이터

scikit-learn 페이지에 가면 자세한 설명을 보실 수 있습니다.

https://scikit-learn.org/stable/datasets/toy_dataset.html?#iris-dataset

7.1. Toy datasets

scikit-learn comes with a few small standard datasets that do not require to download any file from some external website. They can be loaded using the following functions: These datasets are usefu...

scikit-learn.org

load_iris: 아이리스 붓꽃 데이터

Toy Dataset 을 로드하시면 sklearn.utils.Bunch라는 자료구조(클래스)를 반환해 줍니다.
Bunch 클래스는 key-value 형식으로 구성되어 있으며, 딕셔너리(dict) 자료형과 유사한 구조를 가지고 있습니다.

key는 다음과 같습니다.

data: Feature 데이터, Numpy의 배열(ndarray) 혹은 Pandas의 dataframe 형태
target: Label 데이터, Numpy의 배열(ndarray) 혹은 Pandas의 Series 형태
feature_names: Feature 데이터의 이름, 리스트
target_names: Label 데이터의 이름, 리스트
DESCR: 데이터 셋의 설명, 문자열
filename: 데이터 셋의 파일 저장 위치 (csv), 문자열

참고로 data와 feature_names가 특징 데이터(꽃잎 길이와 너비, 꽃받침의 길이와 너비), target과 target_names가 정답 데이터(붓꽃의 3가지 종류) 입니다.

해당 데이터들은 앞서 해본 것과 같이 iris로 한번에 불러올 수도 있지만 아래와 같이 각각 선택해서 가져올 수 있습니다.

현재글파이썬 라이브러리를 활용한 머신러닝 (data set)

[데이터_공예소]

경험과 학습으로 얻는 데이터로 인사이트를 발굴 하고자 합니다.

Today :
Yesterday :

일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

[데이터_공예소]

파이썬 라이브러리를 활용한 머신러닝 (data set)

'카테고리 없음'의 다른글

티스토리툴바