docs: 完善财务数据 API 规范文档
- 更新 FINANCIAL_API_SPEC.md,添加首次同步优化策略 - 添加日期格式转换规范(YYYYMMDD → YYYY-MM-DD) - 补充存储层 UPSERT 禁用说明和删除计数处理 - 扩充常见问题(Q7-Q9) - 完善 financial_api.md,补充资产负债表接口完整文档和报表类型说明 Closes: 文档更新 v1.1
This commit is contained in:
@@ -305,7 +305,7 @@ def sync_full(self, dry_run: bool = False) -> List[Dict]:
|
||||
|
||||
### 单季度同步策略
|
||||
|
||||
**规范**: 单季度同步采用"先删除后插入"策略。
|
||||
**规范**: 单季度同步采用"先删除后插入"策略,并优化首次同步场景。
|
||||
|
||||
**流程**:
|
||||
|
||||
@@ -319,24 +319,41 @@ def sync_quarter(self, period: str, dry_run: bool = False) -> Dict:
|
||||
if self.TARGET_REPORT_TYPE and 'report_type' in remote_df.columns:
|
||||
remote_df = remote_df[remote_df['report_type'] == self.TARGET_REPORT_TYPE]
|
||||
|
||||
# 3. 对比找出差异股票
|
||||
remote_total = len(remote_df)
|
||||
|
||||
# 3. 检查本地是否有该季度数据(首次同步优化)
|
||||
local_counts = self.get_local_data_count_by_stock(period)
|
||||
is_first_sync_for_period = len(local_counts) == 0
|
||||
|
||||
if is_first_sync_for_period:
|
||||
# 首次同步:直接插入所有数据,跳过差异检测
|
||||
print(f"[{self.__class__.__name__}] First sync for quarter {period}, inserting all data directly")
|
||||
if not dry_run:
|
||||
self.storage.queue_save(self.table_name, remote_df, use_upsert=False)
|
||||
self.storage.flush()
|
||||
return {...}
|
||||
|
||||
# 4. 非首次同步:对比找出差异股票
|
||||
diff_df, stats_df = self.compare_and_find_differences(remote_df, period)
|
||||
|
||||
# 4. 执行同步(先删除后插入)
|
||||
# 5. 执行同步(先删除后插入)
|
||||
if not dry_run and not diff_df.empty:
|
||||
diff_stocks = list(diff_df['ts_code'].unique())
|
||||
|
||||
# 4.1 删除差异股票的旧数据
|
||||
# 5.1 删除差异股票的旧数据
|
||||
self.delete_stock_quarter_data(period, diff_stocks)
|
||||
|
||||
# 4.2 插入新数据
|
||||
self.storage.queue_save(self.table_name, diff_df)
|
||||
# 5.2 插入新数据(必须使用 use_upsert=False)
|
||||
self.storage.queue_save(self.table_name, diff_df, use_upsert=False)
|
||||
self.storage.flush()
|
||||
|
||||
return {...}
|
||||
```
|
||||
|
||||
**重要**: 禁止使用 UPSERT(INSERT OR REPLACE),必须使用"先删除后插入"。
|
||||
**重要**:
|
||||
1. 禁止使用 UPSERT(INSERT OR REPLACE),必须使用"先删除后插入"
|
||||
2. **首次同步优化**:本地无数据时直接插入,不进行差异检测,提升性能
|
||||
3. **必须使用 `use_upsert=False`**:调用 `queue_save()` 时必须显式指定,避免触发 UPSERT 错误
|
||||
|
||||
---
|
||||
|
||||
@@ -436,6 +453,149 @@ def delete_stock_quarter_data(
|
||||
return result.rowcount
|
||||
```
|
||||
|
||||
### 删除计数处理
|
||||
|
||||
**注意**: DuckDB 的 DELETE 操作 `rowcount` 属性可能返回 `-1`(表示未知数量),需要特殊处理。
|
||||
|
||||
**改进方案**:
|
||||
|
||||
```python
|
||||
def delete_stock_quarter_data(self, period: str, ts_codes: Optional[List[str]] = None) -> int:
|
||||
"""删除指定季度和股票的数据。"""
|
||||
storage = Storage()
|
||||
|
||||
try:
|
||||
# 将 YYYYMMDD 转换为 YYYY-MM-DD 格式(DuckDB DATE 类型要求)
|
||||
period_formatted = f"{period[:4]}-{period[4:6]}-{period[6:]}"
|
||||
|
||||
if ts_codes:
|
||||
# 删除指定股票的数据
|
||||
placeholders = ', '.join(['?' for _ in ts_codes])
|
||||
query = f'''
|
||||
DELETE FROM "{self.table_name}"
|
||||
WHERE end_date = ? AND ts_code IN ({placeholders})
|
||||
'''
|
||||
storage._connection.execute(query, [period_formatted] + ts_codes)
|
||||
# DuckDB rowcount 返回 -1,使用传入的股票数量作为估算
|
||||
return len(ts_codes)
|
||||
else:
|
||||
# 删除整个季度的数据
|
||||
query = f'DELETE FROM "{self.table_name}" WHERE end_date = ?'
|
||||
storage._connection.execute(query, [period_formatted])
|
||||
return -1 # 标记为未知
|
||||
except Exception as e:
|
||||
print(f"[{self.__class__.__name__}] Error deleting data: {e}")
|
||||
return 0
|
||||
```
|
||||
|
||||
**日志输出改进**:
|
||||
|
||||
```python
|
||||
# 改进后的日志输出
|
||||
if not dry_run and not diff_df.empty:
|
||||
deleted_stocks_count = len(diff_stocks)
|
||||
self.delete_stock_quarter_data(period, diff_stocks)
|
||||
deleted_count = len(diff_df)
|
||||
print(f"[{self.__class__.__name__}] Deleted {deleted_stocks_count} stocks' old records (approx {deleted_count} rows)")
|
||||
```
|
||||
|
||||
输出示例:
|
||||
```
|
||||
[IncomeQuarterSync] Deleted 100 stocks' old records (approx 500 rows)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 日期格式转换
|
||||
|
||||
### DuckDB DATE 类型要求
|
||||
|
||||
DuckDB 的 `DATE` 类型要求格式为 `YYYY-MM-DD`,而 Tushare API 返回的日期格式为 `YYYYMMDD`(字符串)。**必须**在 SQL 查询前进行转换。
|
||||
|
||||
### 转换方法
|
||||
|
||||
```python
|
||||
def _format_period_for_sql(self, period: str) -> str:
|
||||
"""将 YYYYMMDD 格式转换为 YYYY-MM-DD 格式。
|
||||
|
||||
Args:
|
||||
period: YYYYMMDD 格式的日期字符串
|
||||
|
||||
Returns:
|
||||
YYYY-MM-DD 格式的日期字符串
|
||||
"""
|
||||
return f"{period[:4]}-{period[4:6]}-{period[6:]}"
|
||||
|
||||
# 使用示例
|
||||
period = "20240331"
|
||||
period_sql = self._format_period_for_sql(period) # "2024-03-31"
|
||||
|
||||
query = f'SELECT * FROM "{self.table_name}" WHERE end_date = ?'
|
||||
result = storage._connection.execute(query, [period_sql])
|
||||
```
|
||||
|
||||
### 需要转换的位置
|
||||
|
||||
以下方法中涉及 SQL 查询的 `period` 参数时**必须**进行转换:
|
||||
|
||||
1. `get_local_data_count_by_stock()` - 查询本地数据计数
|
||||
2. `get_local_records_by_key()` - 按主键查询本地记录
|
||||
3. `delete_stock_quarter_data()` - 删除季度数据
|
||||
|
||||
### 错误示例
|
||||
|
||||
如果不进行转换,会报以下错误:
|
||||
|
||||
```
|
||||
Conversion Error: invalid date field format: "20250331",
|
||||
expected format is (YYYY-MM-DD)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 存储层配置
|
||||
|
||||
### 禁用 UPSERT
|
||||
|
||||
财务数据表没有主键约束,**必须**在调用存储层方法时禁用 UPSERT。
|
||||
|
||||
### ThreadSafeStorage 配置
|
||||
|
||||
```python
|
||||
class ThreadSafeStorage:
|
||||
"""线程安全的 DuckDB 写入包装器。"""
|
||||
|
||||
def queue_save(self, name: str, data: pd.DataFrame, use_upsert: bool = True):
|
||||
"""将数据放入写入队列。
|
||||
|
||||
Args:
|
||||
name: 表名
|
||||
data: DataFrame 数据
|
||||
use_upsert: 若为 True 使用 INSERT OR REPLACE,若为 False 使用普通 INSERT
|
||||
"""
|
||||
if not data.empty:
|
||||
self._pending_writes.append((name, data, use_upsert))
|
||||
```
|
||||
|
||||
### 财务数据同步时的调用
|
||||
|
||||
```python
|
||||
# 正确:禁用 UPSERT
|
||||
self.storage.queue_save(self.table_name, diff_df, use_upsert=False)
|
||||
|
||||
# 错误:使用默认 UPSERT(会导致 Binder Error)
|
||||
self.storage.queue_save(self.table_name, diff_df) # 默认 use_upsert=True
|
||||
```
|
||||
|
||||
### 错误信息
|
||||
|
||||
如果错误地使用 UPSERT:
|
||||
|
||||
```
|
||||
Binder Error: There are no UNIQUE/PRIMARY KEY constraints that refer
|
||||
to this table, specify ON CONFLICT columns manually
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 表结构设计
|
||||
@@ -880,6 +1040,101 @@ print(result)
|
||||
- 差异股票列表
|
||||
- 删除/插入记录数
|
||||
|
||||
### Q7: 为什么要优化首次同步?
|
||||
|
||||
**A**: 首次同步某个季度时,本地没有数据,不需要进行差异检测和删除操作。直接插入所有数据可以提升性能。
|
||||
|
||||
**优化逻辑**:
|
||||
|
||||
```python
|
||||
# 检查本地是否有该季度数据
|
||||
local_counts = self.get_local_data_count_by_stock(period)
|
||||
is_first_sync_for_period = len(local_counts) == 0
|
||||
|
||||
if is_first_sync_for_period:
|
||||
# 首次同步:直接插入,跳过差异检测
|
||||
print(f"First sync for quarter {period}, inserting all data directly")
|
||||
self.storage.queue_save(self.table_name, remote_df, use_upsert=False)
|
||||
self.storage.flush()
|
||||
else:
|
||||
# 非首次同步:进行差异检测
|
||||
diff_df, stats_df = self.compare_and_find_differences(remote_df, period)
|
||||
# ... 删除旧数据并插入新数据
|
||||
```
|
||||
|
||||
**输出对比**:
|
||||
|
||||
首次同步:
|
||||
```
|
||||
[IncomeQuarterSync] Syncing quarter 20240331...
|
||||
[IncomeQuarterSync] Fetched 5300 records from API
|
||||
[IncomeQuarterSync] First sync for quarter 20240331, inserting all data directly
|
||||
[IncomeQuarterSync] Inserted 5300 new records
|
||||
```
|
||||
|
||||
非首次同步:
|
||||
```
|
||||
[IncomeQuarterSync] Syncing quarter 20240331...
|
||||
[IncomeQuarterSync] Fetched 5300 records from API
|
||||
[IncomeQuarterSync] Comparison result:
|
||||
- Stocks with differences: 100
|
||||
- Unchanged stocks: 5200
|
||||
[IncomeQuarterSync] Deleted 100 stocks' old records (approx 500 rows)
|
||||
[IncomeQuarterSync] Inserted 500 new records
|
||||
```
|
||||
|
||||
### Q8: 为什么会报日期格式错误?
|
||||
|
||||
**A**: DuckDB 的 `DATE` 类型要求格式为 `YYYY-MM-DD`,而系统中使用的日期格式为 `YYYYMMDD`(字符串)。在 SQL 查询前必须进行转换。
|
||||
|
||||
**错误示例**:
|
||||
|
||||
```python
|
||||
# 错误:直接传入 YYYYMMDD 格式
|
||||
query = 'SELECT * FROM table WHERE end_date = ?'
|
||||
result = storage.execute(query, ["20240331"])
|
||||
# 错误:Conversion Error: invalid date field format: "20240331"
|
||||
```
|
||||
|
||||
**正确示例**:
|
||||
|
||||
```python
|
||||
# 正确:转换为 YYYY-MM-DD 格式
|
||||
period_formatted = f"{period[:4]}-{period[4:6]}-{period[6:]}"
|
||||
query = 'SELECT * FROM table WHERE end_date = ?'
|
||||
result = storage.execute(query, [period_formatted])
|
||||
```
|
||||
|
||||
**需要转换的方法**:
|
||||
- `get_local_data_count_by_stock()`
|
||||
- `get_local_records_by_key()`
|
||||
- `delete_stock_quarter_data()`
|
||||
|
||||
### Q9: 为什么会报 UPSERT 错误?
|
||||
|
||||
**A**: 财务数据表没有主键约束,不能使用 `INSERT OR REPLACE`(UPSERT)。必须使用普通 `INSERT`,并通过"先删除后插入"策略确保数据一致性。
|
||||
|
||||
**错误信息**:
|
||||
```
|
||||
Binder Error: There are no UNIQUE/PRIMARY KEY constraints that refer
|
||||
to this table, specify ON CONFLICT columns manually
|
||||
```
|
||||
|
||||
**正确做法**:
|
||||
|
||||
```python
|
||||
# 1. 调用 storage.save() 时指定 use_upsert=False
|
||||
storage.save(table_name, data, use_upsert=False)
|
||||
|
||||
# 2. 调用 queue_save() 时指定 use_upsert=False
|
||||
self.storage.queue_save(self.table_name, diff_df, use_upsert=False)
|
||||
|
||||
# 3. 在删除旧数据后插入新数据
|
||||
self.delete_stock_quarter_data(period, diff_stocks)
|
||||
self.storage.queue_save(self.table_name, diff_df, use_upsert=False)
|
||||
self.storage.flush()
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 附录
|
||||
@@ -900,6 +1155,7 @@ print(result)
|
||||
|
||||
| 日期 | 版本 | 变更内容 |
|
||||
|------|------|----------|
|
||||
| 2026-03-08 | v1.1 | 完善实际编码细节:<br>- 添加首次同步优化说明<br>- 添加日期格式转换规范<br>- 添加存储层 UPSERT 禁用说明<br>- 添加删除计数处理说明<br>- 扩充常见问题(Q7-Q9) |
|
||||
| 2026-03-07 | v1.0 | 初始版本,规范财务数据 API 封装要求 |
|
||||
|
||||
---
|
||||
|
||||
Reference in New Issue
Block a user