第一版流程

2025-11-08 13:39:02 +08:00
parent dcfe2d84d5
commit a66e42a8ae
11 changed files with 1648 additions and 0 deletions
--- a/README.md
+++ b/README.md
@@ -0,0 +1,211 @@
+# 时间序列因子挖掘框架
+
+一套简洁、灵活的时间序列因子挖掘、检验、回测、信号生成框架。
+
+## 特性
+
+- **流程化设计**：清晰的步骤划分，易于理解和扩展
+- **灵活度高**：支持自定义因子、权重方法、信号规则
+- **代码简洁**：避免过度设计，核心逻辑清晰
+- **完整流程**：从数据预处理到信号生成的完整链路
+
+## 项目结构
+
+```
+factorhack/
+├── data.py          # 数据加载和预处理
+├── factors.py       # 因子挖掘（规则因子、GP因子）
+├── validation.py    # 因子检验（IC、分组回测、回归）
+├── combination.py   # 因子组合（多因子模型）
+├── backtest.py      # 回测引擎
+├── signal.py        # 信号生成
+├── pipeline.py      # 主流程
+├── example.py       # 使用示例
+└── README.md        # 说明文档
+```
+
+## 快速开始
+
+### 1. 安装依赖
+
+```bash
+pip install pandas numpy scipy statsmodels
+```
+
+### 2. 基本使用
+
+```python
+from pipeline import FactorPipeline
+
+# 创建流程
+pipeline = FactorPipeline(
+    ret_horizon=1,      # 未来1期收益率
+    ic_window=30,      # IC计算窗口
+    commission=0.001,  # 手续费0.1%
+    slippage=0.0005    # 滑点0.05%
+)
+
+# 运行完整流程
+results = pipeline.run_full_pipeline(
+    file_path="ETH_USDT-1h.feather",
+    min_ic=0.01,        # 最小IC阈值
+    min_tstat=1.5,      # 最小t统计量
+    weight_method='risk_parity',
+    buy_threshold=0.8,
+    sell_threshold=-0.8
+)
+```
+
+### 3. 分步骤执行
+
+```python
+pipeline = FactorPipeline()
+
+# 步骤1：加载和预处理数据
+pipeline.load_and_preprocess("ETH_USDT-1h.feather")
+
+# 步骤2：因子挖掘
+pipeline.mine_factors()
+
+# 步骤3：因子检验
+pipeline.validate_factors(min_ic=0.01, min_tstat=1.5)
+
+# 步骤4：因子组合
+pipeline.combine_factors(weight_method='risk_parity')
+
+# 步骤5：生成信号
+signals = pipeline.generate_signals(buy_threshold=0.8, sell_threshold=-0.8)
+
+# 步骤6：回测
+backtest_results = pipeline.backtest(signals)
+```
+
+## 核心模块说明
+
+### 1. 数据模块 (`data.py`)
+
+- `load_data()`: 加载数据（支持feather和csv）
+- `compute_technical_indicators()`: 计算技术指标
+- `preprocess_data()`: 数据预处理（异常值、缺失值、标准化）
+- `compute_forward_returns()`: 计算未来收益率
+
+### 2. 因子模块 (`factors.py`)
+
+- `BaseFactor`: 因子基类
+- `RuleFactor`: 规则因子
+- `FactorMiner`: 因子挖掘器
+- `create_default_factors()`: 创建默认因子集合
+
+**默认因子**：
+- `TREND`: 趋势因子
+- `VOL`: 波动率因子
+- `VOLP`: 量价因子
+- `REV`: 反转因子
+- `MOM`: 动量因子
+- `RSI`: RSI因子
+
+### 3. 检验模块 (`validation.py`)
+
+- `compute_ic()`: 计算IC（信息系数）
+- `compute_rolling_ic()`: 计算滚动IC
+- `group_backtest()`: 分组回测
+- `factor_span_regression()`: 因子跨度回归
+- `validate_factor()`: 综合因子检验
+
+### 4. 组合模块 (`combination.py`)
+
+- `risk_parity_weights()`: 风险平价权重
+- `regression_weights()`: 回归系数权重
+- `equal_weights()`: 等权重
+- `MultiFactorModel`: 多因子模型
+
+### 5. 回测模块 (`backtest.py`)
+
+- `BacktestEngine`: 回测引擎
+- 支持手续费、滑点
+- 计算年化收益率、夏普比率、最大回撤、胜率等指标
+
+### 6. 信号模块 (`signal.py`)
+
+- `generate_signals()`: 基于因子得分生成买卖信号
+- 支持滚动标准差阈值
+- 避免频繁交易
+
+## 自定义扩展
+
+### 添加自定义因子
+
+```python
+from factors import FactorMiner, RuleFactor
+import pandas as pd
+
+def my_custom_factor(data: pd.DataFrame) -> pd.Series:
+    """自定义因子"""
+    return (data['close'] - data['ema8']) / data['ema8']
+
+miner = create_default_factors()
+miner.register_rule_factor('CUSTOM', my_custom_factor)
+```
+
+### 使用不同的权重方法
+
+```python
+# 风险平价
+pipeline.combine_factors(weight_method='risk_parity')
+
+# 回归系数
+pipeline.combine_factors(weight_method='regression')
+
+# 等权重
+pipeline.combine_factors(weight_method='equal')
+```
+
+### 自定义信号规则
+
+```python
+from signal import generate_signals
+
+signals = generate_signals(
+    score=pipeline.score,
+    buy_threshold=1.0,   # 买入阈值
+    sell_threshold=-1.0, # 卖出阈值
+    window=30            # 滚动窗口
+)
+```
+
+## 数据格式要求
+
+输入数据应包含以下列：
+- `open`: 开盘价
+- `high`: 最高价
+- `low`: 最低价
+- `close`: 收盘价
+- `volume`: 成交量
+
+可选时间列（用于设置索引）：
+- `datetime`, `time`, `timestamp`, `date`
+
+## 输出结果
+
+流程完成后，可获得：
+- **因子数据** (`factors`): 所有有效因子的时间序列
+- **综合得分** (`score`): 多因子综合得分
+- **检验结果** (`validation`): 各因子的IC、t统计量等
+- **回测结果** (`backtest`): 权益曲线、回测指标、交易记录
+
+## 注意事项
+
+1. 数据质量：确保输入数据无严重缺失和异常
+2. 参数调优：根据实际数据特点调整阈值和窗口参数
+3. 过拟合风险：避免在样本内过度优化参数
+4. 实盘差异：回测结果仅供参考，实盘可能存在滑点、延迟等问题
+
+## 参考文档
+
+- `TS因子挖掘构建流程.md`: 详细的因子挖掘理论和方法
+- `deap_factor_mining.py`: 基于遗传编程的因子挖掘示例
+
+## 许可证
+
+MIT License
+