回归问题
官方新方法封装
from tpf.datasets import load_boston X_train, y_train, X_test, y_test = load_boston(split=True,test_size=0.15)
shape:(430, 13), type:class 'numpy.ndarray', X_train shape:(430,), type:class 'numpy.ndarray', y_train
官方旧方法
from sklearn.datasets import load_boston X,y = load_boston(return_X_y=True) ============== ============== Samples total 506 Dimensionality 13 Features real, positive Targets real 5. - 50. ============== ==============
Dimensionality:维度,13个 Features 特征:实数,正的 Targets 标签:实数,浮点数,[5.0,50.0]
Signature: load_boston(*, return_X_y=False) Docstring: DEPRECATED: `load_boston` is deprecated in 1.0 and will be removed in 1.2. return_X_y=True 返回一个包含各种信息的pandas数据集,还需要自己从中提取特征列,标签 X:大写字母,意味这是一个矩阵,至少是二维 y:意味这是一个向量,一维
from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split # 加载数据集 data = load_breast_cancer() X = data.data y = data.target # 划分训练集和测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) X_train.shape,y_train.shape ((455, 30), (455,)) |
二分类问题:ruxianai 自定义一个数据集ds,封装一些常用小数据集 data = datasets.load_breast_cancer() X = data.data # numpy y = data.target X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state) |
import numpy as np import pandas as pd from sklearn.datasets import load_breast_cancer from sklearn.model_selection import train_test_split # 加载乳腺癌数据集 data = load_breast_cancer() X = pd.DataFrame(data.data, columns=data.feature_names) y = pd.DataFrame(data.target,columns=['target']) columns = X.columns df = pd.concat([X,y],axis=1) |
|
|
多分类问题:iris
def load_iris(split=True): """ split:True,拆分数据集为训练集与测试集,False为不拆分 data_list = ds.load_iris() 或 X_train, y_train, X_test, y_test = ds.load_iris() 或 X_train, y_train = ds.load_iris(split=False) """ data = datasets.load_iris() if split: x_train, x_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.20, random_state=73) return x_train, y_train, x_test, y_test else: return data.data, data.target