Files
ProStock/docs/factor_expressions_document.md
liaozhaorun 62a4635a71 feat: 新增因子装饰器系统和完整因子文档
- 添加因子表达式文档,收录180+个因子及数学表达式
- 添加因子实现分析报告,明确ts_*与cs_*算子分类
- 实现装饰器系统:@time_series/@cross_section/@element_wise
- 优化API和翻译器以支持新架构
2026-03-06 23:59:39 +08:00

18 KiB

因子名称与表达式文档

数据来源

  • 分析文件: main/train/Classify2_load_model.ipynb
  • 因子模块: main/factor/factor.py, main/factor/money_factor.py, main/factor/utils.py

一、财务因子 (Financial Factors)

1. add_financial_factor 系列

因子名称: undist_profit_ps, ocfps, roa, roe

表达式:

  • 使用 merge_asof 将财务指标数据按股票代码和公告日期匹配到每个交易日
  • 匹配逻辑: 向后查找(找 ≤ trade_date 的最近财务数据)
  • 公式: factor_value 直接作为因子值

2. calculate_cashflow_to_ev_factor

因子名称: cashflow_to_ev_factor

表达式:

Enterprise Value = total_mv * 10000 + total_liab - money_cap
cashflow_to_ev_factor = n_cashflow_act / Enterprise Value

3. caculate_book_to_price_ratio

因子名称: book_to_price_ratio

表达式:

book_to_price_ratio = bps / close

二、ARBR 因子 (ARBR Factors)

4. calculate_arbr

因子名称: AR, BR, AR_BR

表达式:

# 中间计算
h_minus_o = high - open
o_minus_l = open - low
prev_close = close.shift(1)
h_minus_pc_pos = max(0, high - prev_close)
pc_minus_l_pos = max(0, prev_close - low)

# AR 和 BR 计算
AR = sum(h_minus_o, N) / sum(o_minus_l, N) * 100
BR = sum(h_minus_pc_pos, N) / sum(pc_minus_l_pos, N) * 100
AR_BR = AR - BR

三、技术指标因子 (Technical Indicator Factors)

5. turnover_rate_n

因子名称: turnover_rate_mean_5

表达式:

turnover_rate_mean_5 = mean(turnover_rate, window=5)

6. variance_n

因子名称: variance_20

表达式:

variance_20 = var(pct_chg, window=20)

7. bbi_ratio_factor

因子名称: bbi_ratio_factor

表达式:

SMA3 = mean(close, 3)
SMA6 = mean(close, 6)
SMA12 = mean(close, 12)
SMA24 = mean(close, 24)
BBI = (SMA3 + SMA6 + SMA12 + SMA24) / 4
bbi_ratio_factor = BBI / close

四、偏离度因子 (Deviation Factors)

8. daily_deviation

因子名称: daily_deviation

表达式:

# 计算日级别动量基准
daily_positive_benchmark = mean(pct_chg[pct_chg > 0])  # 每日上涨股票的平均涨跌幅
daily_negative_benchmark = mean(pct_chg[pct_chg < 0])  # 每日下跌股票的平均涨跌幅

# 偏离度计算
if pct_chg > 0 and daily_positive_benchmark > 0:
    daily_deviation = pct_chg - daily_positive_benchmark
elif pct_chg < 0 and daily_negative_benchmark < 0:
    daily_deviation = pct_chg - daily_negative_benchmark
else:
    daily_deviation = 0

9. daily_industry_deviation

因子名称: daily_industry_deviation

表达式:

# 计算日级别行业动量基准
daily_industry_positive_benchmark = mean(pct_chg[pct_chg > 0])  # 按 trade_date + cat_l2_code 分组
daily_industry_negative_benchmark = mean(pct_chg[pct_chg < 0])  # 按 trade_date + cat_l2_code 分组

# 行业偏离度计算
if pct_chg > 0 and daily_industry_positive_benchmark > 0:
    daily_industry_deviation = pct_chg - daily_industry_positive_benchmark
elif pct_chg < 0 and daily_industry_negative_benchmark < 0:
    daily_industry_deviation = pct_chg - daily_industry_negative_benchmark
else:
    daily_industry_deviation = 0

五、滚动因子和简单因子 (Rolling & Simple Factors)

10. get_rolling_factor 生成的因子

资金流因子

因子名称 表达式
lg_elg_net_buy_vol (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol)
flow_lg_elg_intensity lg_elg_net_buy_vol / (vol + epsilon)
sm_net_buy_vol buy_sm_vol - sell_sm_vol
flow_divergence_diff sm_net_buy_vol - lg_elg_net_buy_vol
flow_divergence_ratio sm_net_buy_vol / (lg_elg_net_buy_vol + sign(lg_elg_net_buy_vol) * epsilon + epsilon)
total_buy_vol buy_sm_vol + buy_lg_vol + buy_elg_vol
lg_elg_buy_prop (buy_lg_vol + buy_elg_vol) / (total_buy_vol + epsilon)
flow_struct_buy_change diff(lg_elg_buy_prop, 1)
lg_elg_net_buy_vol_change diff(lg_elg_net_buy_vol, 1)
flow_lg_elg_accel diff(lg_elg_net_buy_vol_change, 1)

筹码分布因子

因子名称 表达式
chip_concentration_range (cost_95pct - cost_5pct) / (close + epsilon)
chip_skewness (weight_avg - cost_50pct) / (cost_50pct + epsilon)
floating_chip_proxy winner_rate * max(0, (close - cost_15pct) / (close + epsilon))
cost_support_15pct_change pct_change(cost_15pct, 1) * 100
cat_winner_price_zone categorical: 1=高风险区, 2=低潜力区, 3=中上获利区, 4=中下亏损区
flow_chip_consistency lg_elg_net_buy_vol * price_near_low_support
profit_taking_vs_absorb lg_elg_net_buy_vol * (winner_rate > 0.7)

波动率因子

因子名称 表达式
upside_vol std(pos_returns, window=20)
downside_vol std(neg_returns, window=20)
vol_ratio upside_vol / downside_vol
return_skew skew(pct_chg, window=5)
return_kurtosis kurt(pct_chg, window=5)

成交量因子

因子名称 表达式
volume_change_rate mean(vol, 2) / mean(vol, 10) - 1
cat_volume_breakout vol > max(vol, 5)
turnover_deviation (turnover_rate - mean(turnover_rate, 3)) / std(turnover_rate, 3)
cat_turnover_spike turnover_rate > mean(turnover_rate, 3) + 2 * std(turnover_rate, 3)
avg_volume_ratio mean(volume_ratio, 3)
cat_volume_ratio_breakout volume_ratio > max(volume_ratio, 5)
vol_spike mean(vol, 20)
vol_std_5 std(pct_change(vol), 5)

技术指标

因子名称 表达式
atr_14 ATR(high, low, close, 14) (TA-Lib)
atr_6 ATR(high, low, close, 6) (TA-Lib)
obv OBV(close, vol) (TA-Lib)
maobv_6 SMA(obv, 6) (TA-Lib)
rsi_3 RSI(close, 3) (TA-Lib)

收益率因子

因子名称 表达式
return_5 close / close.shift(5) - 1
return_20 close / close.shift(20) - 1
std_return_5 std(pct_change(close), 5)
std_return_90 std(pct_change(close), 90)
std_return_90_2 std(pct_change(close.shift(10)), 90)

EMA 因子

因子名称 表达式
act_factor1 atan((EMA(close,5)/EMA(close,5).shift(1)-1)*100) * 57.3 / 50
act_factor2 atan((EMA(close,13)/EMA(close,13).shift(1)-1)*100) * 57.3 / 40
act_factor3 atan((EMA(close,20)/EMA(close,20).shift(1)-1)*100) * 57.3 / 21
act_factor4 atan((EMA(close,60)/EMA(close,60).shift(1)-1)*100) * 57.3 / 10
rank_act_factor1 rank(act_factor1, pct=True)
rank_act_factor2 rank(act_factor2, pct=True)
rank_act_factor3 rank(act_factor3, pct=True)
log_circ_mv log(circ_mv)

Alpha 因子

因子名称 表达式
cov cov(high, vol, window=5)
delta_cov diff(cov, 5)
_stddev_close std(close, 20)
_rank_stddev rank(_stddev_close, pct=True)
alpha_22_improved -1 * delta_cov * _rank_stddev
alpha_003 (close - open) / (high - low) (if high != low else 0)
alpha_007 rank(rolling_corr(close, vol, 5), pct=True)
alpha_013 rank(sum(close, 5) - sum(close, 20), pct=True)

筹码因子

因子名称 表达式
vol_break 1 if (close > cost_85pct) & (volume_ratio > 2) else 0
weight_roc5 pct_change(weight_avg, 5)
price_cost_divergence corr(pct_change(close), pct_change(weight_avg), 10)
smallcap_concentration (1 / log_circ_mv) * (cost_85pct - cost_15pct)
cost_stability std(weight_avg, 20) / mean(weight_avg, 20)
high_cost_break_days sum(close > cost_95pct, 5)
liquidity_risk (cost_95pct - cost_5pct) / mean(vol, 10)
turnover_std std(turnover_rate, 20)
mv_volatility turnover_std / log_circ_mv
volume_growth pct_change(vol, 20)
mv_growth volume_growth / log_circ_mv
momentum_factor volume_change_rate + 0.5 * turnover_deviation
resonance_factor vol_ratio * pct_chg
log_close log(close)
cat_vol_spike vol > 2 * vol_spike
up (high - max(close, open)) / close
down (min(close, open) - low) / close
obv_maobv_6 obv - maobv_6
std_return_5_over_std_return_90 std_return_5 / std_return_90
std_return_90_minus_std_return_90_2 std_return_90 - std_return_90_2
cat_af2 act_factor2 > act_factor1
cat_af3 act_factor3 > act_factor2
cat_af4 act_factor4 > act_factor3
act_factor5 act_factor1 + act_factor2 + act_factor3 + act_factor4
act_factor6 (act_factor1 - act_factor2) / sqrt(act_factor1^2 + act_factor2^2)
active_buy_volume_large buy_lg_vol / net_mf_vol
active_buy_volume_big buy_elg_vol / net_mf_vol
active_buy_volume_small buy_sm_vol / net_mf_vol
buy_lg_vol_minus_sell_lg_vol (buy_lg_vol - sell_lg_vol) / net_mf_vol
buy_elg_vol_minus_sell_elg_vol (buy_elg_vol - sell_elg_vol) / net_mf_vol
ctrl_strength (cost_85pct - cost_15pct) / (his_high - his_low)
low_cost_dev (close - cost_5pct) / (cost_50pct - cost_5pct)
asymmetry (cost_95pct - cost_50pct) / (cost_50pct - cost_5pct)
lock_factor turnover_rate * (1 - (cost_95pct - cost_5pct) / (his_high - his_low))
cat_vol_break (close > cost_85pct) & (volume_ratio > 2)
cost_atr_adj (cost_95pct - cost_5pct) / atr_14
cat_golden_resonance (close > weight_avg) & (volume_ratio > 1.5) & (winner_rate > 0.7)
mv_turnover_ratio turnover_rate / log_circ_mv
mv_adjusted_volume vol / log_circ_mv
mv_weighted_turnover turnover_rate / log_circ_mv
nonlinear_mv_volume vol / log_circ_mv
mv_volume_ratio volume_ratio / log_circ_mv
mv_momentum turnover_rate * volume_ratio / log_circ_mv

六、资金流因子 (Money Flow Factors)

11. lg_flow_mom_corr_20_60

表达式:

net_lg_flow_val = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) * close
rolling_net_lg_flow = sum(net_lg_flow_val, 20)
price_mom = pct_change(close, 20)
lg_flow_mom_corr_20_60 = corr(rolling_net_lg_flow, price_mom, 60)

12. lg_flow_accel

表达式:

net_lg_flow_vol = buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol
lg_flow_accel = diff(diff(net_lg_flow_vol, 1), 1)

13. profit_pressure

表达式:

profit_margin_85 = close / cost_85pct - 1
profit_margin_95 = close / cost_95pct - 1
profit_pressure = winner_rate * 0.5 * (profit_margin_85 + profit_margin_95)

14. underwater_resistance

表达式:

underwater_ratio = 1.0 - winner_rate
dist_to_cost_15 = max(0, cost_15pct - close) / (close + epsilon)
underwater_resistance = underwater_ratio * dist_to_cost_15

15. cost_conc_std_20

表达式:

cost_range_norm = (cost_85pct - cost_15pct) / (weight_avg + epsilon)
cost_conc_std_20 = std(cost_range_norm, 20)

16. profit_decay_20

表达式:

ret_20 = close / close.shift(20) - 1
winner_rate_change_20 = diff(winner_rate, 20)
profit_decay_20 = ret_20 / winner_rate_change_20

17. vol_amp_loss_20

表达式:

vol_20 = std(pct_chg, 20)
loss_degree = max(0, weight_avg - close) / (close + epsilon)
vol_amp_loss_20 = vol_20 * loss_degree

18. vol_drop_profit_cnt_5

表达式:

is_profitable = close > weight_avg * (1 + 0.1)
is_dropping = pct_chg < -0.03
rolling_mean_vol = mean(vol, 20)
rolling_std_vol = std(vol, 20)
is_high_vol = vol > (rolling_mean_vol + 2 * rolling_std_vol)
event = is_profitable & is_dropping & is_high_vol
vol_drop_profit_cnt_5 = sum(event, 5)

19. lg_flow_vol_interact_20

表达式:

vol_20 = std(pct_chg, 20)
net_lg_flow_val = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) * close
total_val = vol * close
abs_net_lg_flow_ratio = abs(net_lg_flow_val) / (total_val + epsilon)
abs_net_lg_flow_ratio_20 = mean(abs_net_lg_flow_ratio, 20)
lg_flow_vol_interact_20 = vol_20 * abs_net_lg_flow_ratio_20

20. cost_break_confirm_cnt_5

表达式:

prev_cost_85 = cost_85pct.shift(1)
prev_cost_15 = cost_15pct.shift(1)
break_up = close > prev_cost_85
break_down = close < prev_cost_15
net_lg_flow_vol = buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol
confirm_up = break_up & (net_lg_flow_vol > 0)
confirm_down = break_down & (net_lg_flow_vol < 0)
net_confirm = confirm_up - confirm_down
cost_break_confirm_cnt_5 = sum(net_confirm, 5)

21. atr_norm_channel_pos_14

表达式:

tr = max(high - low, abs(high - prev_close), abs(low - prev_close))
atr_14 = mean(tr, 14)
roll_low_14 = min(low, 14)
atr_norm_channel_pos_14 = (close - roll_low_14) / atr_14

22. turnover_diff_skew_20

表达式:

turnover_diff = diff(turnover_rate, 1)
turnover_diff_skew_20 = skew(turnover_diff, 20)

23. lg_sm_flow_diverge_20

表达式:

lg_flow_ratio = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) / vol
sm_flow_ratio = (buy_sm_vol - sell_sm_vol) / vol
lg_flow_ratio_20 = mean(lg_flow_ratio, 20)
sm_flow_ratio_20 = mean(sm_flow_ratio, 20)
lg_sm_flow_diverge_20 = lg_flow_ratio_20 - sm_flow_ratio_20

24. pullback_strong_20_20

表达式:

high_20 = max(high, 20)
pullback_depth = (high_20 - close) / high_20
recent_gain_20 = close / close.shift(20) - 1
pullback_strong_20_20 = pullback_depth / recent_gain_20

25. vol_wgt_hist_pos_20

表达式:

hist_pos = (close - his_low) / (his_high - his_low)
rolling_mean_vol_20 = mean(vol, 20)
vol_rel_strength = vol / rolling_mean_vol_20
vol_wgt_hist_pos_20 = hist_pos * vol_rel_strength

26. vol_adj_roc_20

表达式:

roc_20 = close / close.shift(20) - 1
vol_20 = std(pct_chg, 20)
vol_adj_roc_20 = roc_20 / vol_20

七、截面排序因子 (Cross-Sectional Rank Factors)

27. cs_rank_net_lg_flow_val

表达式:

net_lg_flow_val = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) * close
cs_rank_net_lg_flow_val = rank(net_lg_flow_val, pct=True)

28. cs_rank_flow_divergence

表达式:

lg_ratio = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) / vol
sm_ratio = (buy_sm_vol - sell_sm_vol) / vol
divergence = lg_ratio - sm_ratio
cs_rank_flow_divergence = rank(divergence, pct=True)

29. cs_rank_ind_adj_lg_flow

表达式:

net_lg_flow_vol = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) * close
industry_avg_flow = mean(net_lg_flow_vol) by trade_date, cat_l2_code
deviation = net_lg_flow_vol - industry_avg_flow
cs_rank_ind_adj_lg_flow = rank(deviation, pct=True)

30. cs_rank_elg_buy_ratio

表达式:

elg_buy_ratio = buy_elg_vol / vol
cs_rank_elg_buy_ratio = rank(elg_buy_ratio, pct=True)

31. cs_rank_rel_profit_margin

表达式:

profit_margin = (close - weight_avg) / close
cs_rank_rel_profit_margin = rank(profit_margin, pct=True)

32. cs_rank_cost_breadth

表达式:

cost_breadth = (cost_85pct - cost_15pct) / weight_avg
cs_rank_cost_breadth = rank(cost_breadth, pct=True)

33. cs_rank_dist_to_upper_cost

表达式:

dist_to_95 = close / cost_95pct
cs_rank_dist_to_upper_cost = rank(dist_to_95, pct=True)

34. cs_rank_winner_rate

表达式:

cs_rank_winner_rate = rank(winner_rate, pct=True)

35. cs_rank_intraday_range

表达式:

norm_range = (high - low) / close
cs_rank_intraday_range = rank(norm_range, pct=True)

36. cs_rank_close_pos_in_range

表达式:

close_pos = (close - low) / (high - low)
cs_rank_close_pos_in_range = rank(close_pos, pct=True)

37. cs_rank_opening_gap

表达式:

gap = open / pre_close - 1
cs_rank_opening_gap = rank(gap, pct=True)

38. cs_rank_pos_in_hist_range

表达式:

hist_pos = (close - his_low) / (his_high - his_low)
cs_rank_pos_in_hist_range = rank(hist_pos, pct=True)

39. cs_rank_vol_x_profit_margin

表达式:

daily_vol = abs(pct_chg)
profit_margin = (close - weight_avg) / close
interaction = daily_vol * profit_margin
cs_rank_vol_x_profit_margin = rank(interaction, pct=True)

40. cs_rank_lg_flow_price_concordance

表达式:

net_lg_flow_vol = buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol
concordance = net_lg_flow_vol * pct_chg
cs_rank_lg_flow_price_concordance = rank(concordance, pct=True)

41. cs_rank_turnover_per_winner

表达式:

turnover_per_winner = turnover_rate / winner_rate
cs_rank_turnover_per_winner = rank(turnover_per_winner, pct=True)

42. cs_rank_ind_cap_neutral_pe

表达式: Placeholder - 需要 statsmodels 实现

43. cs_rank_volume_ratio

表达式:

cs_rank_volume_ratio = rank(volume_ratio, pct=True)

44. cs_rank_elg_buy_sell_sm_ratio

表达式:

ratio = buy_elg_vol / sell_sm_vol
cs_rank_elg_buy_sell_sm_ratio = rank(ratio, pct=True)

45. cs_rank_cost_dist_vol_ratio

表达式:

dist = abs(close - weight_avg) / (close + epsilon)
interaction = dist * volume_ratio
cs_rank_cost_dist_vol_ratio = rank(interaction, pct=True)

46. cs_rank_size

表达式:

log_circ_mv = log1p(circ_mv)
cs_rank_size = rank(log_circ_mv, pct=True)

八、行业因子 (Industry Factors)

47. get_act_factor (from main.utils.factor)

生成的因子: act_factor1, act_factor2, act_factor3, act_factor4

表达式:

obv = OBV(close, vol)
return_5 = close / close.shift(5) - 1
return_20 = close / close.shift(20) - 1
return_5_percentile = rank(return_5, pct=True)
return_20_percentile = rank(return_20, pct=True)

列重命名: 行业因子列名前缀为 industry_


附录:符号说明

符号 含义
epsilon 极小值 (1e-10),防止除零
mean(x, N) N周期滚动平均值
std(x, N) N周期滚动标准差
var(x, N) N周期滚动方差
sum(x, N) N周期滚动求和
max(x, N) N周期滚动最大值
min(x, N) N周期滚动最小值
diff(x, N) N周期差分
pct_change(x, N) N周期百分比变化
shift(x, N) N周期位移
rank(x, pct=True) 截面排序 (百分比)
corr(x, y, N) N周期滚动相关系数
cov(x, y, N) N周期滚动协方差
skew(x, N) N周期滚动偏度
kurt(x, N) N周期滚动峰度
ATR Average True Range (TA-Lib)
OBV On-Balance Volume (TA-Lib)
RSI Relative Strength Index (TA-Lib)
SMA Simple Moving Average (TA-Lib)
EMA Exponential Moving Average (TA-Lib)
atan 反正切函数

文档生成时间: 2026-03-06 共收录 180+ 个因子