refactor: 存储层迁移DuckDB + 模块重构

- 存储层重构: HDF5 → DuckDB(UPSERT模式、线程安全存储)
- Sync类迁移: DataSync从sync.py迁移到api_daily.py(职责分离)
- 模型模块重构: src/models → src/pipeline(更清晰的命名)
- 新增因子模块: factors/momentum (MA、收益率排名)、factors/financial
- 新增API接口: api_namechange、api_bak_basic
- 新增训练入口: training模块(main.py、pipeline配置)
- 工具函数统一: get_today_date等移至utils.py
- 文档更新: AGENTS.md添加架构变更历史
This commit is contained in:
2026-02-23 16:23:53 +08:00
parent 9f95be56a0
commit 593ec99466
32 changed files with 4181 additions and 1395 deletions

46
src/training/__init__.py Normal file
View File

@@ -0,0 +1,46 @@
"""ProStock 训练流程模块
本模块提供完整的模型训练流程:
1. 数据处理Fillna(0) -> Dropna
2. 模型训练LightGBM分类模型
3. 预测选股每日top5股票池
使用示例:
from src.training import run_training
# 运行完整训练流程
result = run_training(
train_start="20180101",
train_end="20230101",
test_start="20230101",
test_end="20240101",
top_n=5,
output_path="output/top_stocks.tsv"
)
因子使用:
from src.factors import MovingAverageFactor, ReturnRankFactor
ma5 = MovingAverageFactor(period=5) # 5日移动平均
ma10 = MovingAverageFactor(period=10) # 10日移动平均
ret5 = ReturnRankFactor(period=5) # 5日收益率排名
"""
from src.training.pipeline import (
create_pipeline,
predict_top_stocks,
prepare_data,
run_training,
save_top_stocks,
train_model,
)
__all__ = [
# 管道函数
"prepare_data",
"create_pipeline",
"train_model",
"predict_top_stocks",
"save_top_stocks",
"run_training",
]