2026-01-31 00:02:35 +08:00
|
|
|
|
# ProStock
|
|
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
A股量化投资框架,用于量化股票投资分析。
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
## 特性
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
- **数据管理**:Tushare API 行情数据获取,DuckDB 本地数据存储
|
|
|
|
|
|
- **因子引擎**:DSL 表达式驱动的高性能因子计算框架(基于 Polars)
|
|
|
|
|
|
- **机器学习**:支持 LightGBM 回归和 LambdaRank 排序学习
|
|
|
|
|
|
- **组件化设计**:灵活的数据处理器、股票池管理、过滤器组合
|
2026-01-31 00:02:35 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
## 环境要求
|
2026-02-01 02:29:54 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
- Python 3.10+
|
|
|
|
|
|
- uv 包管理器
|
2026-02-01 02:29:54 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
## 安装
|
2026-02-01 02:29:54 +08:00
|
|
|
|
|
|
|
|
|
|
```bash
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 克隆项目
|
|
|
|
|
|
cd ProStock
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 使用 uv 安装依赖
|
2026-02-01 02:29:54 +08:00
|
|
|
|
uv pip install -e .
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
## 配置
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
|
|
|
|
|
创建 `config/.env.local` 文件:
|
2026-02-01 02:29:54 +08:00
|
|
|
|
|
|
|
|
|
|
```bash
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# Tushare Token(必需)
|
|
|
|
|
|
TUSHARE_TOKEN=your_token_here
|
|
|
|
|
|
|
|
|
|
|
|
# 数据存储路径(可选,默认 data/)
|
2026-02-23 01:37:34 +08:00
|
|
|
|
DATA_PATH=data
|
2026-03-14 01:59:45 +08:00
|
|
|
|
|
|
|
|
|
|
# API 速率限制(可选,默认 100)
|
2026-02-23 01:37:34 +08:00
|
|
|
|
RATE_LIMIT=100
|
2026-03-14 01:59:45 +08:00
|
|
|
|
|
|
|
|
|
|
# 并发线程数(可选,默认 10)
|
2026-02-23 01:37:34 +08:00
|
|
|
|
THREADS=10
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
## 快速开始
|
2026-02-01 02:29:54 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
### 1. 同步股票数据
|
2026-02-01 02:29:54 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
```python
|
|
|
|
|
|
from src.data.sync import sync_all
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 增量同步(默认)
|
|
|
|
|
|
sync_all()
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 强制全量同步
|
|
|
|
|
|
sync_all(force_full=True)
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 自定义线程数
|
|
|
|
|
|
sync_all(max_workers=20)
|
2026-02-23 01:37:34 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
### 2. 计算因子
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
|
|
|
|
|
```python
|
2026-03-14 01:48:56 +08:00
|
|
|
|
from src.factors import FactorEngine
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:48:56 +08:00
|
|
|
|
# 初始化引擎
|
|
|
|
|
|
engine = FactorEngine()
|
|
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 添加因子(推荐使用字符串表达式)
|
2026-03-14 01:48:56 +08:00
|
|
|
|
engine.add_factor("ma20", "ts_mean(close, 20)")
|
|
|
|
|
|
engine.add_factor("alpha", "cs_rank(ts_mean(close, 5) - ts_mean(close, 20))")
|
|
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 计算因子值
|
|
|
|
|
|
result = engine.compute(["ma20", "alpha"], "20240101", "20240131")
|
2026-02-23 01:37:34 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
### 3. 训练模型
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
|
|
|
|
|
```python
|
2026-03-14 01:59:45 +08:00
|
|
|
|
from src.training import Trainer, DateSplitter, StockPoolManager
|
|
|
|
|
|
from src.training.components.models import LightGBMModel
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 创建模型
|
2026-03-14 01:48:56 +08:00
|
|
|
|
model = LightGBMModel(params={
|
|
|
|
|
|
"objective": "regression",
|
|
|
|
|
|
"num_leaves": 20,
|
|
|
|
|
|
"learning_rate": 0.01,
|
|
|
|
|
|
"n_estimators": 1000,
|
|
|
|
|
|
})
|
|
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 创建数据划分器
|
2026-03-14 01:48:56 +08:00
|
|
|
|
splitter = DateSplitter(
|
|
|
|
|
|
train_start="20200101",
|
2026-03-14 01:59:45 +08:00
|
|
|
|
train_end="20231231",
|
|
|
|
|
|
val_start="20240101",
|
|
|
|
|
|
val_end="20241231",
|
|
|
|
|
|
test_start="20250101",
|
|
|
|
|
|
test_end="20251231",
|
2026-03-14 01:48:56 +08:00
|
|
|
|
)
|
|
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 创建训练器并训练
|
2026-03-14 01:48:56 +08:00
|
|
|
|
trainer = Trainer(
|
|
|
|
|
|
model=model,
|
|
|
|
|
|
splitter=splitter,
|
|
|
|
|
|
target_col="future_return_5",
|
2026-03-14 01:59:45 +08:00
|
|
|
|
feature_cols=["ma_5", "ma_20", "volume_ratio"],
|
2026-03-14 01:48:56 +08:00
|
|
|
|
)
|
|
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
trainer.train(data)
|
|
|
|
|
|
results = trainer.get_results()
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 项目结构
|
2026-03-14 01:48:56 +08:00
|
|
|
|
|
2026-02-23 01:37:34 +08:00
|
|
|
|
```
|
2026-03-14 01:59:45 +08:00
|
|
|
|
ProStock/
|
|
|
|
|
|
├── src/
|
|
|
|
|
|
│ ├── config/ # 配置管理
|
|
|
|
|
|
│ ├── data/ # 数据获取与存储
|
|
|
|
|
|
│ │ ├── api_wrappers/ # Tushare API 封装
|
|
|
|
|
|
│ │ ├── storage.py # DuckDB 存储
|
|
|
|
|
|
│ │ └── sync.py # 数据同步调度
|
|
|
|
|
|
│ ├── factors/ # 因子计算框架
|
|
|
|
|
|
│ │ ├── engine/ # 执行引擎
|
|
|
|
|
|
│ │ ├── metadata/ # 因子元数据管理
|
|
|
|
|
|
│ │ ├── dsl.py # DSL 表达式层
|
|
|
|
|
|
│ │ └── translator.py # Polars 翻译器
|
|
|
|
|
|
│ └── training/ # 训练模块
|
|
|
|
|
|
│ ├── core/ # 训练核心
|
|
|
|
|
|
│ └── components/ # 组件(模型、处理器、过滤器)
|
|
|
|
|
|
├── tests/ # 测试文件
|
|
|
|
|
|
├── data/ # 数据存储
|
|
|
|
|
|
└── docs/ # 文档
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 因子框架
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
### 支持的函数
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
**时间序列函数 (ts_*)**:
|
|
|
|
|
|
- `ts_mean`, `ts_std`, `ts_max`, `ts_min`, `ts_sum`
|
|
|
|
|
|
- `ts_delay`, `ts_delta`
|
|
|
|
|
|
- `ts_corr`, `ts_cov`, `ts_rank`
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
**截面函数 (cs_*)**:
|
|
|
|
|
|
- `cs_rank` - 截面排名
|
|
|
|
|
|
- `cs_zscore` - Z-Score 标准化
|
|
|
|
|
|
- `cs_neutralize` - 行业/市值中性化
|
|
|
|
|
|
- `cs_winsorize` - 缩尾处理
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
**数学函数**:
|
|
|
|
|
|
- `log`, `exp`, `sqrt`, `sign`, `abs`
|
|
|
|
|
|
- `max_`, `min_`, `clip`
|
|
|
|
|
|
- `if_`, `where`
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
### 因子元数据管理
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
|
|
|
|
|
```python
|
2026-03-14 01:59:45 +08:00
|
|
|
|
from src.factors.metadata import FactorManager
|
|
|
|
|
|
|
|
|
|
|
|
# 初始化管理器
|
|
|
|
|
|
manager = FactorManager()
|
|
|
|
|
|
|
|
|
|
|
|
# 添加因子
|
|
|
|
|
|
manager.add_factor({
|
|
|
|
|
|
"factor_id": "F_001",
|
|
|
|
|
|
"name": "mom_5d",
|
|
|
|
|
|
"desc": "5日价格动量",
|
|
|
|
|
|
"dsl": "cs_rank(close / ts_delay(close, 5) - 1)",
|
|
|
|
|
|
"category": "momentum",
|
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
|
|
# 查询因子
|
|
|
|
|
|
df = manager.get_factors_by_name("mom_5d")
|
2026-02-23 01:37:34 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
## 常见任务
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
```bash
|
|
|
|
|
|
# 运行所有测试
|
|
|
|
|
|
uv run pytest
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 同步财务数据
|
|
|
|
|
|
uv run python -c "from src.data.api_wrappers.financial_data import sync_financial; sync_financial()"
|
2026-02-01 02:29:54 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
# 批量注册因子
|
|
|
|
|
|
uv run python src/scripts/register_factors.py
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 依赖项
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
- pandas >= 2.0.0
|
|
|
|
|
|
- polars >= 0.20.0
|
|
|
|
|
|
- numpy >= 1.24.0
|
|
|
|
|
|
- tushare >= 2.0.0
|
|
|
|
|
|
- pydantic >= 2.0.0
|
|
|
|
|
|
- lightgbm >= 4.0.0
|
|
|
|
|
|
- pytest
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
## 文档
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
更多详细信息请参阅 `docs/` 目录:
|
2026-02-23 01:37:34 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
- [因子表达式文档](docs/factor_expressions_document.md)
|
|
|
|
|
|
- [API 接口规范](docs/api/API_INTERFACE_SPEC.md)
|
|
|
|
|
|
- [财务数据接口](docs/api/FINANCIAL_API_SPEC.md)
|
2026-02-01 02:29:54 +08:00
|
|
|
|
|
2026-02-23 01:37:34 +08:00
|
|
|
|
## 许可证
|
2026-01-31 00:02:35 +08:00
|
|
|
|
|
2026-03-14 01:59:45 +08:00
|
|
|
|
MIT
|