2026-02-21 03:43:30 +08:00
|
|
|
|
# ProStock 数据接口封装规范
|
|
|
|
|
|
## 1. 概述
|
|
|
|
|
|
本文档定义了新增 Tushare API 接口封装的标准规范。所有非特殊接口必须遵循此规范,确保:
|
|
|
|
|
|
- 代码风格统一
|
|
|
|
|
|
- 自动 sync 支持
|
|
|
|
|
|
- 增量更新逻辑一致
|
|
|
|
|
|
- 减少存储写入压力
|
2026-02-23 16:23:53 +08:00
|
|
|
|
- 类型安全(强制类型提示)
|
|
|
|
|
|
|
|
|
|
|
|
### 1.1 技术栈
|
|
|
|
|
|
|
|
|
|
|
|
- **存储层**: DuckDB(高性能嵌入式 OLAP 数据库)
|
|
|
|
|
|
- **数据格式**: Pandas DataFrame / Polars DataFrame
|
|
|
|
|
|
- **速率限制**: 令牌桶算法(TokenBucketRateLimiter)
|
|
|
|
|
|
- **并发**: ThreadPoolExecutor 多线程
|
|
|
|
|
|
- **类型系统**: Python 3.10+ 类型提示
|
|
|
|
|
|
|
|
|
|
|
|
### 1.2 自动化支持
|
|
|
|
|
|
|
|
|
|
|
|
项目提供 `prostock-api-interface` Skill 来自动化接口封装流程。在 `api.md` 中定义接口后,调用该 Skill 可自动生成:
|
|
|
|
|
|
- 数据模块文件(`src/data/api_wrappers/api_{data_type}.py`)
|
|
|
|
|
|
- 数据库表管理配置
|
|
|
|
|
|
- 测试文件(`tests/test_{data_type}.py`)
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
## 2. 接口分类
|
|
|
|
|
|
|
|
|
|
|
|
### 2.1 特殊接口(不参与统一 sync)
|
|
|
|
|
|
|
|
|
|
|
|
以下接口有独立的同步逻辑,不参与自动 sync 机制:
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
| 接口类型 | 文件名 | 说明 |
|
|
|
|
|
|
|---------|--------|------|
|
|
|
|
|
|
| 交易日历 | `api_trade_cal.py` | 全局数据,按日期范围获取,使用 HDF5 缓存 |
|
|
|
|
|
|
| 股票基础信息 | `api_stock_basic.py` | 一次性全量获取,CSV 存储 |
|
|
|
|
|
|
| 辅助数据 | `api_industry`, `api_concept` | 低频更新,独立管理 |
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
### 2.2 标准接口(必须遵循本规范)
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
所有按股票或按日期获取的因子数据、行情数据、财务数据等,必须遵循本规范:
|
|
|
|
|
|
|
|
|
|
|
|
- 按日期获取:**优先选择**,支持全市场批量获取
|
|
|
|
|
|
- 按股票获取:仅当 API 不支持按日期获取时使用
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
## 3. 文件结构要求
|
|
|
|
|
|
|
|
|
|
|
|
### 3.1 文件命名
|
|
|
|
|
|
|
|
|
|
|
|
```
|
2026-02-23 16:23:53 +08:00
|
|
|
|
api_{data_type}.py
|
2026-02-21 03:43:30 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
- 示例:`api_daily.py`、`api_moneyflow.py`、`api_limit_list.py`
|
|
|
|
|
|
- **必须**以 `api_` 前缀开头
|
|
|
|
|
|
- 使用小写字母和下划线
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
### 3.2 文件位置
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
所有接口文件必须位于 `src/data/api_wrappers/` 目录下。
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
### 3.3 导出要求
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
新接口必须在 `src/data/api_wrappers/__init__.py` 中导出:
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
```python
|
2026-02-23 16:23:53 +08:00
|
|
|
|
from src.data.api_wrappers.api_{data_type} import get_{data_type}
|
2026-02-21 03:43:30 +08:00
|
|
|
|
__all__ = [
|
|
|
|
|
|
# ... 其他导出 ...
|
|
|
|
|
|
"get_{data_type}",
|
|
|
|
|
|
]
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 4. 接口设计规范
|
|
|
|
|
|
|
|
|
|
|
|
### 4.1 数据获取函数签名要求
|
|
|
|
|
|
|
|
|
|
|
|
函数必须返回 `pd.DataFrame`,参数必须包含以下之一:
|
|
|
|
|
|
|
|
|
|
|
|
#### 4.1.1 按日期获取的接口(优先)
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
适用于:涨跌停、龙虎榜、筹码分布、每日指标等。
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
**函数签名要求**:
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
def get_{data_type}(
|
|
|
|
|
|
trade_date: Optional[str] = None,
|
|
|
|
|
|
start_date: Optional[str] = None,
|
|
|
|
|
|
end_date: Optional[str] = None,
|
|
|
|
|
|
ts_code: Optional[str] = None,
|
|
|
|
|
|
# 其他可选参数...
|
|
|
|
|
|
) -> pd.DataFrame:
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**要求**:
|
|
|
|
|
|
- 优先使用 `trade_date` 获取单日全市场数据
|
|
|
|
|
|
- 支持 `start_date + end_date` 获取区间数据
|
|
|
|
|
|
- `ts_code` 作为可选过滤参数
|
2026-02-23 16:23:53 +08:00
|
|
|
|
- **性能优势**: 单日全市场数据一次 API 调用即可完成
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
#### 4.1.2 按股票获取的接口
|
|
|
|
|
|
|
|
|
|
|
|
适用于:日线行情、资金流向等。
|
|
|
|
|
|
|
|
|
|
|
|
**函数签名要求**:
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
def get_{data_type}(
|
|
|
|
|
|
ts_code: str,
|
|
|
|
|
|
start_date: Optional[str] = None,
|
|
|
|
|
|
end_date: Optional[str] = None,
|
|
|
|
|
|
# 其他可选参数...
|
|
|
|
|
|
) -> pd.DataFrame:
|
|
|
|
|
|
```
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
**要求**:
|
|
|
|
|
|
- `ts_code` 为必选参数
|
|
|
|
|
|
- 需要遍历所有股票获取全市场数据
|
|
|
|
|
|
|
2026-02-21 03:43:30 +08:00
|
|
|
|
### 4.2 文档字符串要求
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
函数必须包含 **Google 风格**的完整文档字符串,包含:
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
def get_{data_type}(...) -> pd.DataFrame:
|
|
|
|
|
|
"""Fetch {数据描述} from Tushare.
|
|
|
|
|
|
|
|
|
|
|
|
This interface retrieves {详细描述}.
|
|
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
|
ts_code: Stock code (e.g., '000001.SZ', '600000.SH')
|
|
|
|
|
|
trade_date: Specific trade date (YYYYMMDD format)
|
|
|
|
|
|
start_date: Start date (YYYYMMDD format)
|
|
|
|
|
|
end_date: End date (YYYYMMDD format)
|
|
|
|
|
|
# 其他参数...
|
|
|
|
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
|
|
pd.DataFrame with columns:
|
|
|
|
|
|
- ts_code: Stock code
|
|
|
|
|
|
- trade_date: Trade date (YYYYMMDD)
|
|
|
|
|
|
- {其他字段}: {字段描述}
|
|
|
|
|
|
|
|
|
|
|
|
Example:
|
|
|
|
|
|
>>> # Get single date data for all stocks
|
|
|
|
|
|
>>> data = get_{data_type}(trade_date='20240101')
|
|
|
|
|
|
>>>
|
|
|
|
|
|
>>> # Get date range data
|
|
|
|
|
|
>>> data = get_{data_type}(start_date='20240101', end_date='20240131')
|
|
|
|
|
|
>>>
|
|
|
|
|
|
>>> # Get specific stock data
|
|
|
|
|
|
>>> data = get_{data_type}(ts_code='000001.SZ', trade_date='20240101')
|
|
|
|
|
|
"""
|
|
|
|
|
|
```
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
### 4.3 日期格式要求
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
- 所有日期参数使用 **YYYYMMDD** 字符串格式
|
2026-02-21 03:43:30 +08:00
|
|
|
|
- 统一使用 `trade_date` 作为日期字段名
|
2026-02-23 16:23:53 +08:00
|
|
|
|
- 如果 API 返回其他日期字段名(如 `date`、`end_date`),必须在返回前重命名为 `trade_date`:
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
```python
|
|
|
|
|
|
if "date" in data.columns:
|
|
|
|
|
|
data = data.rename(columns={"date": "trade_date"})
|
|
|
|
|
|
```
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 4.4 股票代码要求
|
2026-02-21 03:43:30 +08:00
|
|
|
|
- 统一使用 `ts_code` 作为股票代码字段名
|
|
|
|
|
|
- 格式:`{code}.{exchange}`,如 `000001.SZ`、`600000.SH`
|
2026-02-23 16:23:53 +08:00
|
|
|
|
- 确保返回的 DataFrame 包含 `ts_code` 列
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
### 4.5 令牌桶限速要求
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
所有 API 调用必须通过 `TushareClient`,自动满足令牌桶限速要求:
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
from src.data.client import TushareClient
|
|
|
|
|
|
|
|
|
|
|
|
def get_{data_type}(...) -> pd.DataFrame:
|
|
|
|
|
|
client = TushareClient()
|
|
|
|
|
|
|
|
|
|
|
|
# Build parameters
|
|
|
|
|
|
params = {}
|
|
|
|
|
|
if trade_date:
|
|
|
|
|
|
params["trade_date"] = trade_date
|
|
|
|
|
|
if ts_code:
|
|
|
|
|
|
params["ts_code"] = ts_code
|
|
|
|
|
|
# ...
|
|
|
|
|
|
|
|
|
|
|
|
# Fetch data (rate limiting handled automatically)
|
|
|
|
|
|
data = client.query("{api_name}", **params)
|
|
|
|
|
|
|
|
|
|
|
|
return data
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**注意**: `TushareClient` 自动处理:
|
|
|
|
|
|
- 令牌桶速率限制
|
|
|
|
|
|
- API 重试逻辑(指数退避)
|
|
|
|
|
|
- 配置加载
|
|
|
|
|
|
|
|
|
|
|
|
## 5. DuckDB 存储规范
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 5.1 存储架构
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
项目使用 **DuckDB** 作为持久化存储:
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
- **单例模式**: `Storage` 类确保单一数据库连接
|
|
|
|
|
|
- **线程安全**: `ThreadSafeStorage` 提供并发写入支持
|
|
|
|
|
|
- **UPSERT 支持**: `INSERT OR REPLACE` 自动处理重复数据
|
|
|
|
|
|
- **查询下推**: WHERE 条件在数据库层过滤
|
|
|
|
|
|
|
|
|
|
|
|
### 5.2 表结构设计
|
|
|
|
|
|
|
|
|
|
|
|
每个数据类型对应一个 DuckDB 表:
|
|
|
|
|
|
|
|
|
|
|
|
```sql
|
|
|
|
|
|
CREATE TABLE {data_type} (
|
|
|
|
|
|
ts_code VARCHAR(16) NOT NULL,
|
|
|
|
|
|
trade_date DATE NOT NULL,
|
|
|
|
|
|
# 其他字段...
|
|
|
|
|
|
PRIMARY KEY (ts_code, trade_date)
|
|
|
|
|
|
);
|
|
|
|
|
|
|
|
|
|
|
|
CREATE INDEX idx_{data_type}_date_code ON {data_type}(trade_date, ts_code);
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
**主键要求**:
|
|
|
|
|
|
- 必须包含 `ts_code` 和 `trade_date`
|
|
|
|
|
|
- 使用 UPSERT 确保幂等性
|
|
|
|
|
|
|
|
|
|
|
|
### 5.3 存储写入策略
|
|
|
|
|
|
|
|
|
|
|
|
**批量写入模式**(推荐用于多线程场景):
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
```python
|
2026-02-23 16:23:53 +08:00
|
|
|
|
from src.data.storage import ThreadSafeStorage
|
|
|
|
|
|
|
|
|
|
|
|
def sync_{data_type}(self, ...):
|
|
|
|
|
|
storage = ThreadSafeStorage()
|
|
|
|
|
|
|
|
|
|
|
|
# 收集数据到队列(不立即写入)
|
|
|
|
|
|
for data_chunk in data_generator:
|
|
|
|
|
|
storage.queue_save("{data_type}", data_chunk)
|
|
|
|
|
|
|
|
|
|
|
|
# 批量写入所有数据
|
|
|
|
|
|
storage.flush()
|
2026-02-21 03:43:30 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
**直接写入模式**(适用于简单场景):
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
```python
|
|
|
|
|
|
from src.data.storage import Storage
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
storage = Storage()
|
|
|
|
|
|
storage.save("{data_type}", data, mode="append")
|
|
|
|
|
|
```
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 5.4 数据类型映射
|
|
|
|
|
|
|
|
|
|
|
|
标准字段类型映射(`DEFAULT_TYPE_MAPPING`):
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
```python
|
2026-02-23 16:23:53 +08:00
|
|
|
|
DEFAULT_TYPE_MAPPING = {
|
|
|
|
|
|
"ts_code": "VARCHAR(16)",
|
|
|
|
|
|
"trade_date": "DATE",
|
|
|
|
|
|
"open": "DOUBLE",
|
|
|
|
|
|
"high": "DOUBLE",
|
|
|
|
|
|
"low": "DOUBLE",
|
|
|
|
|
|
"close": "DOUBLE",
|
|
|
|
|
|
"vol": "DOUBLE",
|
|
|
|
|
|
"amount": "DOUBLE",
|
|
|
|
|
|
# ... 其他字段
|
|
|
|
|
|
}
|
2026-02-21 03:43:30 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
## 6. Sync 集成规范
|
|
|
|
|
|
|
|
|
|
|
|
### 6.1 使用 db_manager 进行同步
|
|
|
|
|
|
|
|
|
|
|
|
项目使用 `db_manager` 模块提供高级同步功能:
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
```python
|
2026-02-23 16:23:53 +08:00
|
|
|
|
from src.data.db_manager import SyncManager, ensure_table
|
|
|
|
|
|
|
2026-02-21 03:43:30 +08:00
|
|
|
|
def sync_{data_type}(force_full: bool = False) -> pd.DataFrame:
|
2026-02-23 16:23:53 +08:00
|
|
|
|
"""Sync {数据描述} to DuckDB."""
|
|
|
|
|
|
|
|
|
|
|
|
manager = SyncManager()
|
|
|
|
|
|
|
|
|
|
|
|
# 确保表存在
|
|
|
|
|
|
ensure_table("{data_type}", schema={
|
|
|
|
|
|
"ts_code": "VARCHAR(16)",
|
|
|
|
|
|
"trade_date": "DATE",
|
|
|
|
|
|
# ... 其他字段
|
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
|
|
# 执行同步
|
|
|
|
|
|
result = manager.sync(
|
|
|
|
|
|
table_name="{data_type}",
|
|
|
|
|
|
fetch_func=get_{data_type},
|
|
|
|
|
|
start_date=start_date,
|
|
|
|
|
|
end_date=end_date,
|
|
|
|
|
|
force_full=force_full,
|
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
return result
|
2026-02-21 03:43:30 +08:00
|
|
|
|
```
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 6.2 增量更新逻辑
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
`SyncManager` 自动处理增量更新:
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
1. **检查本地最新日期**: 从 DuckDB 获取 `MAX(trade_date)`
|
|
|
|
|
|
2. **获取交易日历**: 从 `api_trade_cal` 获取交易日范围
|
|
|
|
|
|
3. **计算需要同步的日期**: 本地最新日期 + 1 到最新交易日
|
|
|
|
|
|
4. **批量获取数据**: 按日期或按股票获取
|
|
|
|
|
|
5. **批量写入**: 使用 `ThreadSafeStorage` 队列写入
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 6.3 便捷函数
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
每个接口必须提供顶层便捷函数:
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
```python
|
|
|
|
|
|
def sync_{data_type}(force_full: bool = False) -> pd.DataFrame:
|
|
|
|
|
|
"""Sync {数据描述} to local storage.
|
|
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
|
force_full: If True, force full reload from 20180101
|
|
|
|
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
|
|
DataFrame with synced data
|
|
|
|
|
|
"""
|
|
|
|
|
|
# Implementation...
|
|
|
|
|
|
```
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
## 7. 代码模板
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 7.1 按日期获取接口模板
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
"""{数据描述} interface.
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
Fetch {数据描述} data from Tushare.
|
|
|
|
|
|
"""
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
import pandas as pd
|
|
|
|
|
|
from typing import Optional
|
|
|
|
|
|
from src.data.client import TushareClient
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
|
|
|
|
|
|
def get_{data_type}(
|
|
|
|
|
|
trade_date: Optional[str] = None,
|
|
|
|
|
|
start_date: Optional[str] = None,
|
|
|
|
|
|
end_date: Optional[str] = None,
|
|
|
|
|
|
ts_code: Optional[str] = None,
|
|
|
|
|
|
) -> pd.DataFrame:
|
|
|
|
|
|
"""Fetch {数据描述} from Tushare.
|
|
|
|
|
|
|
|
|
|
|
|
This interface retrieves {详细描述}.
|
|
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
|
trade_date: Specific trade date (YYYYMMDD format)
|
|
|
|
|
|
start_date: Start date (YYYYMMDD format)
|
|
|
|
|
|
end_date: End date (YYYYMMDD format)
|
|
|
|
|
|
ts_code: Stock code filter (optional)
|
|
|
|
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
|
|
pd.DataFrame with columns:
|
|
|
|
|
|
- ts_code: Stock code
|
|
|
|
|
|
- trade_date: Trade date (YYYYMMDD)
|
|
|
|
|
|
- {字段1}: {描述}
|
|
|
|
|
|
- {字段2}: {描述}
|
|
|
|
|
|
|
|
|
|
|
|
Example:
|
|
|
|
|
|
>>> # Get all stocks for a single date
|
|
|
|
|
|
>>> data = get_{data_type}(trade_date='20240101')
|
|
|
|
|
|
>>>
|
|
|
|
|
|
>>> # Get date range data
|
|
|
|
|
|
>>> data = get_{data_type}(start_date='20240101', end_date='20240131')
|
|
|
|
|
|
"""
|
|
|
|
|
|
client = TushareClient()
|
|
|
|
|
|
|
|
|
|
|
|
# Build parameters
|
|
|
|
|
|
params = {}
|
|
|
|
|
|
if trade_date:
|
|
|
|
|
|
params["trade_date"] = trade_date
|
|
|
|
|
|
if start_date:
|
|
|
|
|
|
params["start_date"] = start_date
|
|
|
|
|
|
if end_date:
|
|
|
|
|
|
params["end_date"] = end_date
|
|
|
|
|
|
if ts_code:
|
|
|
|
|
|
params["ts_code"] = ts_code
|
|
|
|
|
|
|
|
|
|
|
|
# Fetch data
|
|
|
|
|
|
data = client.query("{tushare_api_name}", **params)
|
|
|
|
|
|
|
|
|
|
|
|
# Rename date column if needed
|
|
|
|
|
|
if "date" in data.columns:
|
|
|
|
|
|
data = data.rename(columns={"date": "trade_date"})
|
|
|
|
|
|
|
|
|
|
|
|
return data
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 7.2 按股票获取接口模板
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
"""{数据描述} interface.
|
|
|
|
|
|
|
|
|
|
|
|
Fetch {数据描述} data from Tushare (per stock).
|
|
|
|
|
|
"""
|
|
|
|
|
|
|
|
|
|
|
|
import pandas as pd
|
|
|
|
|
|
from typing import Optional
|
|
|
|
|
|
from src.data.client import TushareClient
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def get_{data_type}(
|
|
|
|
|
|
ts_code: str,
|
|
|
|
|
|
start_date: Optional[str] = None,
|
|
|
|
|
|
end_date: Optional[str] = None,
|
|
|
|
|
|
) -> pd.DataFrame:
|
|
|
|
|
|
"""Fetch {数据描述} for a specific stock.
|
|
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
|
ts_code: Stock code (e.g., '000001.SZ')
|
|
|
|
|
|
start_date: Start date (YYYYMMDD format)
|
|
|
|
|
|
end_date: End date (YYYYMMDD format)
|
|
|
|
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
|
|
pd.DataFrame with {数据描述} data
|
|
|
|
|
|
"""
|
|
|
|
|
|
client = TushareClient()
|
|
|
|
|
|
|
|
|
|
|
|
params = {"ts_code": ts_code}
|
|
|
|
|
|
if start_date:
|
|
|
|
|
|
params["start_date"] = start_date
|
|
|
|
|
|
if end_date:
|
|
|
|
|
|
params["end_date"] = end_date
|
|
|
|
|
|
|
|
|
|
|
|
data = client.query("{tushare_api_name}", **params)
|
|
|
|
|
|
|
|
|
|
|
|
return data
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
### 7.3 Sync 函数模板
|
|
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
|
from src.data.db_manager import SyncManager, ensure_table
|
|
|
|
|
|
from src.data.api_wrappers import get_{data_type}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
def sync_{data_type}(force_full: bool = False) -> pd.DataFrame:
|
|
|
|
|
|
"""Sync {数据描述} to local DuckDB storage.
|
|
|
|
|
|
|
|
|
|
|
|
Args:
|
|
|
|
|
|
force_full: If True, force full reload from 20180101
|
|
|
|
|
|
|
|
|
|
|
|
Returns:
|
|
|
|
|
|
DataFrame with synced data
|
|
|
|
|
|
"""
|
|
|
|
|
|
manager = SyncManager()
|
|
|
|
|
|
|
|
|
|
|
|
# Ensure table exists with proper schema
|
|
|
|
|
|
ensure_table("{data_type}", schema={
|
|
|
|
|
|
"ts_code": "VARCHAR(16)",
|
|
|
|
|
|
"trade_date": "DATE",
|
|
|
|
|
|
# Add other fields...
|
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
|
|
# Perform sync
|
|
|
|
|
|
result = manager.sync(
|
|
|
|
|
|
table_name="{data_type}",
|
|
|
|
|
|
fetch_func=get_{data_type},
|
|
|
|
|
|
force_full=force_full,
|
|
|
|
|
|
)
|
|
|
|
|
|
|
|
|
|
|
|
return result
|
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
|
|
## 8. 测试规范
|
|
|
|
|
|
|
|
|
|
|
|
### 8.1 测试文件要求
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
必须创建对应的测试文件:`tests/test_{data_type}.py`
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 8.2 测试覆盖要求
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
```python
|
|
|
|
|
|
import pytest
|
|
|
|
|
|
import pandas as pd
|
|
|
|
|
|
from unittest.mock import patch, MagicMock
|
|
|
|
|
|
from src.data.api_wrappers.api_{data_type} import get_{data_type}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
class Test{DataType}:
|
|
|
|
|
|
"""Test suite for {data_type} API wrapper."""
|
|
|
|
|
|
|
|
|
|
|
|
@patch("src.data.api_wrappers.api_{data_type}.TushareClient")
|
|
|
|
|
|
def test_get_by_date(self, mock_client_class):
|
|
|
|
|
|
"""Test fetching data by date."""
|
|
|
|
|
|
# Setup mock
|
|
|
|
|
|
mock_client = MagicMock()
|
|
|
|
|
|
mock_client_class.return_value = mock_client
|
|
|
|
|
|
mock_client.query.return_value = pd.DataFrame({
|
|
|
|
|
|
"ts_code": ["000001.SZ"],
|
|
|
|
|
|
"trade_date": ["20240101"],
|
|
|
|
|
|
# ... other columns
|
|
|
|
|
|
})
|
|
|
|
|
|
|
|
|
|
|
|
# Test
|
|
|
|
|
|
result = get_{data_type}(trade_date="20240101")
|
|
|
|
|
|
|
|
|
|
|
|
# Assert
|
|
|
|
|
|
assert not result.empty
|
|
|
|
|
|
assert "ts_code" in result.columns
|
|
|
|
|
|
assert "trade_date" in result.columns
|
|
|
|
|
|
mock_client.query.assert_called_once()
|
|
|
|
|
|
|
|
|
|
|
|
@patch("src.data.api_wrappers.api_{data_type}.TushareClient")
|
|
|
|
|
|
def test_get_by_stock(self, mock_client_class):
|
|
|
|
|
|
"""Test fetching data by stock code."""
|
|
|
|
|
|
# Similar setup...
|
|
|
|
|
|
pass
|
|
|
|
|
|
|
|
|
|
|
|
@patch("src.data.api_wrappers.api_{data_type}.TushareClient")
|
|
|
|
|
|
def test_empty_response(self, mock_client_class):
|
|
|
|
|
|
"""Test handling empty response."""
|
|
|
|
|
|
mock_client = MagicMock()
|
|
|
|
|
|
mock_client_class.return_value = mock_client
|
|
|
|
|
|
mock_client.query.return_value = pd.DataFrame()
|
|
|
|
|
|
|
|
|
|
|
|
result = get_{data_type}(trade_date="20240101")
|
|
|
|
|
|
assert result.empty
|
|
|
|
|
|
```
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 8.3 Mock 规范
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
- 在导入位置打补丁:`patch('src.data.api_wrappers.api_{data_type}.TushareClient')`
|
|
|
|
|
|
- 测试正常和异常情况
|
|
|
|
|
|
- 验证参数传递正确
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
## 9. 使用 Skill 自动生成
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 9.1 准备工作
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
1. 在 `api.md` 中定义接口信息,包含:
|
|
|
|
|
|
- 接口名称和描述
|
|
|
|
|
|
- 输入参数(名称、类型、必选、描述)
|
|
|
|
|
|
- 输出参数(名称、类型、描述)
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 9.2 调用 Skill
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
告知 Claude 要封装的接口名称:
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
> "帮我封装 {data_type} 接口"
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
> "为 {data_type} 接口生成代码"
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 9.3 自动生成内容
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
Skill 会自动:
|
|
|
|
|
|
1. 解析 `api.md` 中的接口定义
|
|
|
|
|
|
2. 生成 `src/data/api_wrappers/api_{data_type}.py`
|
|
|
|
|
|
3. 更新 `src/data/api_wrappers/__init__.py` 导出
|
|
|
|
|
|
4. 生成 `tests/test_{data_type}.py` 测试文件
|
|
|
|
|
|
5. 提供 `sync_{data_type}()` 函数模板
|
|
|
|
|
|
|
|
|
|
|
|
## 10. 检查清单
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 10.1 文件结构
|
|
|
|
|
|
- [ ] 文件位于 `src/data/api_wrappers/api_{data_type}.py`
|
|
|
|
|
|
- [ ] 已更新 `src/data/api_wrappers/__init__.py` 导出公共接口
|
|
|
|
|
|
- [ ] 已创建 `tests/test_{data_type}.py` 测试文件
|
|
|
|
|
|
### 10.2 接口实现
|
2026-02-21 03:43:30 +08:00
|
|
|
|
- [ ] 数据获取函数使用 `TushareClient`
|
|
|
|
|
|
- [ ] 函数包含完整的 Google 风格文档字符串
|
|
|
|
|
|
- [ ] 日期参数使用 `YYYYMMDD` 格式
|
|
|
|
|
|
- [ ] 返回的 DataFrame 包含 `ts_code` 和 `trade_date` 字段
|
|
|
|
|
|
- [ ] 优先实现按日期获取的接口(如果 API 支持)
|
2026-02-23 16:23:53 +08:00
|
|
|
|
- [ ] 参数传递前检查是否为 None
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 10.3 存储集成
|
|
|
|
|
|
- [ ] 使用 `Storage` 或 `ThreadSafeStorage` 进行数据存储
|
|
|
|
|
|
- [ ] 表结构包含 `ts_code` 和 `trade_date` 作为主键
|
|
|
|
|
|
- [ ] 使用 UPSERT 模式(`INSERT OR REPLACE`)
|
|
|
|
|
|
- [ ] 多线程场景使用 `queue_save()` + `flush()` 模式
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 10.4 Sync 集成
|
|
|
|
|
|
- [ ] 使用 `db_manager` 模块进行同步管理
|
|
|
|
|
|
- [ ] 实现 `sync_{data_type}()` 便捷函数
|
|
|
|
|
|
- [ ] 支持 `force_full` 参数
|
|
|
|
|
|
- [ ] 增量更新逻辑正确
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
### 10.5 测试
|
2026-02-21 03:43:30 +08:00
|
|
|
|
- [ ] 已编写单元测试
|
2026-02-23 16:23:53 +08:00
|
|
|
|
- [ ] 已 mock `TushareClient`
|
|
|
|
|
|
- [ ] 测试覆盖按日期和按股票获取
|
2026-02-21 03:43:30 +08:00
|
|
|
|
- [ ] 测试覆盖正常和异常情况
|
2026-02-23 16:23:53 +08:00
|
|
|
|
## 11. 示例参考
|
|
|
|
|
|
|
|
|
|
|
|
### 11.1 完整示例:api_daily.py
|
|
|
|
|
|
|
|
|
|
|
|
参见 `src/data/api_wrappers/api_daily.py` - 按股票获取日线数据的完整实现。
|
|
|
|
|
|
|
|
|
|
|
|
### 11.2 完整示例:api_trade_cal.py
|
|
|
|
|
|
|
|
|
|
|
|
参见 `src/data/api_wrappers/api_trade_cal.py` - 特殊接口(交易日历)的实现,包含 HDF5 缓存逻辑。
|
|
|
|
|
|
|
|
|
|
|
|
### 11.3 完整示例:api_stock_basic.py
|
|
|
|
|
|
|
|
|
|
|
|
参见 `src/data/api_wrappers/api_stock_basic.py` - 特殊接口(股票基础信息)的实现,包含 CSV 存储逻辑。
|
2026-02-21 03:43:30 +08:00
|
|
|
|
|
|
|
|
|
|
---
|
|
|
|
|
|
|
2026-02-23 16:23:53 +08:00
|
|
|
|
**最后更新**: 2026-02-23
|
|
|
|
|
|
|
|
|
|
|
|
**版本**: v2.0 - 更新 DuckDB 存储规范,添加 Skill 自动化说明
|