refactor: 代码审查修复 - 日期过滤、性能优化、数据泄露防护

- 修复 data_loader.py 财务数据日期过滤,支持按范围加载
- 优化 MADClipper 使用窗口函数替代 join,提升性能
- 修复训练日期边界问题,添加1天间隔避免数据泄露
- 新增 .gitignore 规则忽略训练输出目录
This commit is contained in:
2026-02-25 21:11:19 +08:00
parent 593ec99466
commit a9e4746239
24 changed files with 3597 additions and 56 deletions

View File

@@ -12,13 +12,16 @@ from src.training.pipeline import run_training
if __name__ == "__main__":
# 运行完整训练流程
# 训练集20180101 - 20230101
# 测试20230101 - 20240101
# 训练集20190101 - 20231231
# 验证20240102 - 20240531 (与训练集间隔1天避免数据泄露)
# 测试集20240602 - 20241231 (与验证集间隔1天避免数据泄露)
result = run_training(
train_start="20190101",
train_end="20250101",
test_start="20250101",
test_end="20260101",
train_end="20231231",
val_start="20240102",
val_end="20240531",
test_start="20240602",
test_end="20241231",
top_n=5,
output_path="output/top_stocks.tsv",
)