212 lines
4.5 KiB
Markdown
212 lines
4.5 KiB
Markdown
# ProStock
|
||
|
||
A股量化投资框架,用于量化股票投资分析。
|
||
|
||
## 特性
|
||
|
||
- **数据管理**:Tushare API 行情数据获取,DuckDB 本地数据存储
|
||
- **因子引擎**:DSL 表达式驱动的高性能因子计算框架(基于 Polars)
|
||
- **机器学习**:支持 LightGBM 回归和 LambdaRank 排序学习
|
||
- **组件化设计**:灵活的数据处理器、股票池管理、过滤器组合
|
||
|
||
## 环境要求
|
||
|
||
- Python 3.10+
|
||
- uv 包管理器
|
||
|
||
## 安装
|
||
|
||
```bash
|
||
# 克隆项目
|
||
cd ProStock
|
||
|
||
# 使用 uv 安装依赖
|
||
uv pip install -e .
|
||
```
|
||
|
||
## 配置
|
||
|
||
创建 `config/.env.local` 文件:
|
||
|
||
```bash
|
||
# Tushare Token(必需)
|
||
TUSHARE_TOKEN=your_token_here
|
||
|
||
# 数据存储路径(可选,默认 data/)
|
||
DATA_PATH=data
|
||
|
||
# API 速率限制(可选,默认 100)
|
||
RATE_LIMIT=100
|
||
|
||
# 并发线程数(可选,默认 10)
|
||
THREADS=10
|
||
```
|
||
|
||
## 快速开始
|
||
|
||
### 1. 同步股票数据
|
||
|
||
```python
|
||
from src.data.sync import sync_all
|
||
|
||
# 增量同步(默认)
|
||
sync_all()
|
||
|
||
# 强制全量同步
|
||
sync_all(force_full=True)
|
||
|
||
# 自定义线程数
|
||
sync_all(max_workers=20)
|
||
```
|
||
|
||
### 2. 计算因子
|
||
|
||
```python
|
||
from src.factors import FactorEngine
|
||
|
||
# 初始化引擎
|
||
engine = FactorEngine()
|
||
|
||
# 添加因子(推荐使用字符串表达式)
|
||
engine.add_factor("ma20", "ts_mean(close, 20)")
|
||
engine.add_factor("alpha", "cs_rank(ts_mean(close, 5) - ts_mean(close, 20))")
|
||
|
||
# 计算因子值
|
||
result = engine.compute(["ma20", "alpha"], "20240101", "20240131")
|
||
```
|
||
|
||
### 3. 训练模型
|
||
|
||
```python
|
||
from src.training import Trainer, DateSplitter, StockPoolManager
|
||
from src.training.components.models import LightGBMModel
|
||
|
||
# 创建模型
|
||
model = LightGBMModel(params={
|
||
"objective": "regression",
|
||
"num_leaves": 20,
|
||
"learning_rate": 0.01,
|
||
"n_estimators": 1000,
|
||
})
|
||
|
||
# 创建数据划分器
|
||
splitter = DateSplitter(
|
||
train_start="20200101",
|
||
train_end="20231231",
|
||
val_start="20240101",
|
||
val_end="20241231",
|
||
test_start="20250101",
|
||
test_end="20251231",
|
||
)
|
||
|
||
# 创建训练器并训练
|
||
trainer = Trainer(
|
||
model=model,
|
||
splitter=splitter,
|
||
target_col="future_return_5",
|
||
feature_cols=["ma_5", "ma_20", "volume_ratio"],
|
||
)
|
||
|
||
trainer.train(data)
|
||
results = trainer.get_results()
|
||
```
|
||
|
||
## 项目结构
|
||
|
||
```
|
||
ProStock/
|
||
├── src/
|
||
│ ├── config/ # 配置管理
|
||
│ ├── data/ # 数据获取与存储
|
||
│ │ ├── api_wrappers/ # Tushare API 封装
|
||
│ │ ├── storage.py # DuckDB 存储
|
||
│ │ └── sync.py # 数据同步调度
|
||
│ ├── factors/ # 因子计算框架
|
||
│ │ ├── engine/ # 执行引擎
|
||
│ │ ├── metadata/ # 因子元数据管理
|
||
│ │ ├── dsl.py # DSL 表达式层
|
||
│ │ └── translator.py # Polars 翻译器
|
||
│ └── training/ # 训练模块
|
||
│ ├── core/ # 训练核心
|
||
│ └── components/ # 组件(模型、处理器、过滤器)
|
||
├── tests/ # 测试文件
|
||
├── data/ # 数据存储
|
||
└── docs/ # 文档
|
||
```
|
||
|
||
## 因子框架
|
||
|
||
### 支持的函数
|
||
|
||
**时间序列函数 (ts_*)**:
|
||
- `ts_mean`, `ts_std`, `ts_max`, `ts_min`, `ts_sum`
|
||
- `ts_delay`, `ts_delta`
|
||
- `ts_corr`, `ts_cov`, `ts_rank`
|
||
|
||
**截面函数 (cs_*)**:
|
||
- `cs_rank` - 截面排名
|
||
- `cs_zscore` - Z-Score 标准化
|
||
- `cs_neutralize` - 行业/市值中性化
|
||
- `cs_winsorize` - 缩尾处理
|
||
|
||
**数学函数**:
|
||
- `log`, `exp`, `sqrt`, `sign`, `abs`
|
||
- `max_`, `min_`, `clip`
|
||
- `if_`, `where`
|
||
|
||
### 因子元数据管理
|
||
|
||
```python
|
||
from src.factors.metadata import FactorManager
|
||
|
||
# 初始化管理器
|
||
manager = FactorManager()
|
||
|
||
# 添加因子
|
||
manager.add_factor({
|
||
"factor_id": "F_001",
|
||
"name": "mom_5d",
|
||
"desc": "5日价格动量",
|
||
"dsl": "cs_rank(close / ts_delay(close, 5) - 1)",
|
||
"category": "momentum",
|
||
})
|
||
|
||
# 查询因子
|
||
df = manager.get_factors_by_name("mom_5d")
|
||
```
|
||
|
||
## 常见任务
|
||
|
||
```bash
|
||
# 运行所有测试
|
||
uv run pytest
|
||
|
||
# 同步财务数据
|
||
uv run python -c "from src.data.api_wrappers.financial_data import sync_financial; sync_financial()"
|
||
|
||
# 批量注册因子
|
||
uv run python src/scripts/register_factors.py
|
||
```
|
||
|
||
## 依赖项
|
||
|
||
- pandas >= 2.0.0
|
||
- polars >= 0.20.0
|
||
- numpy >= 1.24.0
|
||
- tushare >= 2.0.0
|
||
- pydantic >= 2.0.0
|
||
- lightgbm >= 4.0.0
|
||
- pytest
|
||
|
||
## 文档
|
||
|
||
更多详细信息请参阅 `docs/` 目录:
|
||
|
||
- [因子表达式文档](docs/factor_expressions_document.md)
|
||
- [API 接口规范](docs/api/API_INTERFACE_SPEC.md)
|
||
- [财务数据接口](docs/api/FINANCIAL_API_SPEC.md)
|
||
|
||
## 许可证
|
||
|
||
MIT
|