# 因子名称与表达式文档 ## 数据来源 - 分析文件: `main/train/Classify2_load_model.ipynb` - 因子模块: `main/factor/factor.py`, `main/factor/money_factor.py`, `main/factor/utils.py` --- ## 一、财务因子 (Financial Factors) ### 1. add_financial_factor 系列 **因子名称**: `undist_profit_ps`, `ocfps`, `roa`, `roe` **表达式**: - 使用 `merge_asof` 将财务指标数据按股票代码和公告日期匹配到每个交易日 - 匹配逻辑: 向后查找(找 ≤ trade_date 的最近财务数据) - 公式: `factor_value` 直接作为因子值 ### 2. calculate_cashflow_to_ev_factor **因子名称**: `cashflow_to_ev_factor` **表达式**: ``` Enterprise Value = total_mv * 10000 + total_liab - money_cap cashflow_to_ev_factor = n_cashflow_act / Enterprise Value ``` ### 3. caculate_book_to_price_ratio **因子名称**: `book_to_price_ratio` **表达式**: ``` book_to_price_ratio = bps / close ``` --- ## 二、ARBR 因子 (ARBR Factors) ### 4. calculate_arbr **因子名称**: `AR`, `BR`, `AR_BR` **表达式**: ``` # 中间计算 h_minus_o = high - open o_minus_l = open - low prev_close = close.shift(1) h_minus_pc_pos = max(0, high - prev_close) pc_minus_l_pos = max(0, prev_close - low) # AR 和 BR 计算 AR = sum(h_minus_o, N) / sum(o_minus_l, N) * 100 BR = sum(h_minus_pc_pos, N) / sum(pc_minus_l_pos, N) * 100 AR_BR = AR - BR ``` --- ## 三、技术指标因子 (Technical Indicator Factors) ### 5. turnover_rate_n **因子名称**: `turnover_rate_mean_5` **表达式**: ``` turnover_rate_mean_5 = mean(turnover_rate, window=5) ``` ### 6. variance_n **因子名称**: `variance_20` **表达式**: ``` variance_20 = var(pct_chg, window=20) ``` ### 7. bbi_ratio_factor **因子名称**: `bbi_ratio_factor` **表达式**: ``` SMA3 = mean(close, 3) SMA6 = mean(close, 6) SMA12 = mean(close, 12) SMA24 = mean(close, 24) BBI = (SMA3 + SMA6 + SMA12 + SMA24) / 4 bbi_ratio_factor = BBI / close ``` --- ## 四、偏离度因子 (Deviation Factors) ### 8. daily_deviation **因子名称**: `daily_deviation` **表达式**: ``` # 计算日级别动量基准 daily_positive_benchmark = mean(pct_chg[pct_chg > 0]) # 每日上涨股票的平均涨跌幅 daily_negative_benchmark = mean(pct_chg[pct_chg < 0]) # 每日下跌股票的平均涨跌幅 # 偏离度计算 if pct_chg > 0 and daily_positive_benchmark > 0: daily_deviation = pct_chg - daily_positive_benchmark elif pct_chg < 0 and daily_negative_benchmark < 0: daily_deviation = pct_chg - daily_negative_benchmark else: daily_deviation = 0 ``` ### 9. daily_industry_deviation **因子名称**: `daily_industry_deviation` **表达式**: ``` # 计算日级别行业动量基准 daily_industry_positive_benchmark = mean(pct_chg[pct_chg > 0]) # 按 trade_date + cat_l2_code 分组 daily_industry_negative_benchmark = mean(pct_chg[pct_chg < 0]) # 按 trade_date + cat_l2_code 分组 # 行业偏离度计算 if pct_chg > 0 and daily_industry_positive_benchmark > 0: daily_industry_deviation = pct_chg - daily_industry_positive_benchmark elif pct_chg < 0 and daily_industry_negative_benchmark < 0: daily_industry_deviation = pct_chg - daily_industry_negative_benchmark else: daily_industry_deviation = 0 ``` --- ## 五、滚动因子和简单因子 (Rolling & Simple Factors) ### 10. get_rolling_factor 生成的因子 #### 资金流因子 | 因子名称 | 表达式 | |---------|--------| | `lg_elg_net_buy_vol` | `(buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol)` | | `flow_lg_elg_intensity` | `lg_elg_net_buy_vol / (vol + epsilon)` | | `sm_net_buy_vol` | `buy_sm_vol - sell_sm_vol` | | `flow_divergence_diff` | `sm_net_buy_vol - lg_elg_net_buy_vol` | | `flow_divergence_ratio` | `sm_net_buy_vol / (lg_elg_net_buy_vol + sign(lg_elg_net_buy_vol) * epsilon + epsilon)` | | `total_buy_vol` | `buy_sm_vol + buy_lg_vol + buy_elg_vol` | | `lg_elg_buy_prop` | `(buy_lg_vol + buy_elg_vol) / (total_buy_vol + epsilon)` | | `flow_struct_buy_change` | `diff(lg_elg_buy_prop, 1)` | | `lg_elg_net_buy_vol_change` | `diff(lg_elg_net_buy_vol, 1)` | | `flow_lg_elg_accel` | `diff(lg_elg_net_buy_vol_change, 1)` | #### 筹码分布因子 | 因子名称 | 表达式 | |---------|--------| | `chip_concentration_range` | `(cost_95pct - cost_5pct) / (close + epsilon)` | | `chip_skewness` | `(weight_avg - cost_50pct) / (cost_50pct + epsilon)` | | `floating_chip_proxy` | `winner_rate * max(0, (close - cost_15pct) / (close + epsilon))` | | `cost_support_15pct_change` | `pct_change(cost_15pct, 1) * 100` | | `cat_winner_price_zone` | categorical: 1=高风险区, 2=低潜力区, 3=中上获利区, 4=中下亏损区 | | `flow_chip_consistency` | `lg_elg_net_buy_vol * price_near_low_support` | | `profit_taking_vs_absorb` | `lg_elg_net_buy_vol * (winner_rate > 0.7)` | #### 波动率因子 | 因子名称 | 表达式 | |---------|--------| | `upside_vol` | `std(pos_returns, window=20)` | | `downside_vol` | `std(neg_returns, window=20)` | | `vol_ratio` | `upside_vol / downside_vol` | | `return_skew` | `skew(pct_chg, window=5)` | | `return_kurtosis` | `kurt(pct_chg, window=5)` | #### 成交量因子 | 因子名称 | 表达式 | |---------|--------| | `volume_change_rate` | `mean(vol, 2) / mean(vol, 10) - 1` | | `cat_volume_breakout` | `vol > max(vol, 5)` | | `turnover_deviation` | `(turnover_rate - mean(turnover_rate, 3)) / std(turnover_rate, 3)` | | `cat_turnover_spike` | `turnover_rate > mean(turnover_rate, 3) + 2 * std(turnover_rate, 3)` | | `avg_volume_ratio` | `mean(volume_ratio, 3)` | | `cat_volume_ratio_breakout` | `volume_ratio > max(volume_ratio, 5)` | | `vol_spike` | `mean(vol, 20)` | | `vol_std_5` | `std(pct_change(vol), 5)` | #### 技术指标 | 因子名称 | 表达式 | |---------|--------| | `atr_14` | `ATR(high, low, close, 14)` (TA-Lib) | | `atr_6` | `ATR(high, low, close, 6)` (TA-Lib) | | `obv` | `OBV(close, vol)` (TA-Lib) | | `maobv_6` | `SMA(obv, 6)` (TA-Lib) | | `rsi_3` | `RSI(close, 3)` (TA-Lib) | #### 收益率因子 | 因子名称 | 表达式 | |---------|--------| | `return_5` | `close / close.shift(5) - 1` | | `return_20` | `close / close.shift(20) - 1` | | `std_return_5` | `std(pct_change(close), 5)` | | `std_return_90` | `std(pct_change(close), 90)` | | `std_return_90_2` | `std(pct_change(close.shift(10)), 90)` | #### EMA 因子 | 因子名称 | 表达式 | |---------|--------| | `act_factor1` | `atan((EMA(close,5)/EMA(close,5).shift(1)-1)*100) * 57.3 / 50` | | `act_factor2` | `atan((EMA(close,13)/EMA(close,13).shift(1)-1)*100) * 57.3 / 40` | | `act_factor3` | `atan((EMA(close,20)/EMA(close,20).shift(1)-1)*100) * 57.3 / 21` | | `act_factor4` | `atan((EMA(close,60)/EMA(close,60).shift(1)-1)*100) * 57.3 / 10` | | `rank_act_factor1` | `rank(act_factor1, pct=True)` | | `rank_act_factor2` | `rank(act_factor2, pct=True)` | | `rank_act_factor3` | `rank(act_factor3, pct=True)` | | `log_circ_mv` | `log(circ_mv)` | #### Alpha 因子 | 因子名称 | 表达式 | |---------|--------| | `cov` | `cov(high, vol, window=5)` | | `delta_cov` | `diff(cov, 5)` | | `_stddev_close` | `std(close, 20)` | | `_rank_stddev` | `rank(_stddev_close, pct=True)` | | `alpha_22_improved` | `-1 * delta_cov * _rank_stddev` | | `alpha_003` | `(close - open) / (high - low)` (if high != low else 0) | | `alpha_007` | `rank(rolling_corr(close, vol, 5), pct=True)` | | `alpha_013` | `rank(sum(close, 5) - sum(close, 20), pct=True)` | #### 筹码因子 | 因子名称 | 表达式 | |---------|--------| | `vol_break` | `1 if (close > cost_85pct) & (volume_ratio > 2) else 0` | | `weight_roc5` | `pct_change(weight_avg, 5)` | | `price_cost_divergence` | `corr(pct_change(close), pct_change(weight_avg), 10)` | | `smallcap_concentration` | `(1 / log_circ_mv) * (cost_85pct - cost_15pct)` | | `cost_stability` | `std(weight_avg, 20) / mean(weight_avg, 20)` | | `high_cost_break_days` | `sum(close > cost_95pct, 5)` | | `liquidity_risk` | `(cost_95pct - cost_5pct) / mean(vol, 10)` | | `turnover_std` | `std(turnover_rate, 20)` | | `mv_volatility` | `turnover_std / log_circ_mv` | | `volume_growth` | `pct_change(vol, 20)` | | `mv_growth` | `volume_growth / log_circ_mv` | | `momentum_factor` | `volume_change_rate + 0.5 * turnover_deviation` | | `resonance_factor` | `vol_ratio * pct_chg` | | `log_close` | `log(close)` | | `cat_vol_spike` | `vol > 2 * vol_spike` | | `up` | `(high - max(close, open)) / close` | | `down` | `(min(close, open) - low) / close` | | `obv_maobv_6` | `obv - maobv_6` | | `std_return_5_over_std_return_90` | `std_return_5 / std_return_90` | | `std_return_90_minus_std_return_90_2` | `std_return_90 - std_return_90_2` | | `cat_af2` | `act_factor2 > act_factor1` | | `cat_af3` | `act_factor3 > act_factor2` | | `cat_af4` | `act_factor4 > act_factor3` | | `act_factor5` | `act_factor1 + act_factor2 + act_factor3 + act_factor4` | | `act_factor6` | `(act_factor1 - act_factor2) / sqrt(act_factor1^2 + act_factor2^2)` | | `active_buy_volume_large` | `buy_lg_vol / net_mf_vol` | | `active_buy_volume_big` | `buy_elg_vol / net_mf_vol` | | `active_buy_volume_small` | `buy_sm_vol / net_mf_vol` | | `buy_lg_vol_minus_sell_lg_vol` | `(buy_lg_vol - sell_lg_vol) / net_mf_vol` | | `buy_elg_vol_minus_sell_elg_vol` | `(buy_elg_vol - sell_elg_vol) / net_mf_vol` | | `ctrl_strength` | `(cost_85pct - cost_15pct) / (his_high - his_low)` | | `low_cost_dev` | `(close - cost_5pct) / (cost_50pct - cost_5pct)` | | `asymmetry` | `(cost_95pct - cost_50pct) / (cost_50pct - cost_5pct)` | | `lock_factor` | `turnover_rate * (1 - (cost_95pct - cost_5pct) / (his_high - his_low))` | | `cat_vol_break` | `(close > cost_85pct) & (volume_ratio > 2)` | | `cost_atr_adj` | `(cost_95pct - cost_5pct) / atr_14` | | `cat_golden_resonance` | `(close > weight_avg) & (volume_ratio > 1.5) & (winner_rate > 0.7)` | | `mv_turnover_ratio` | `turnover_rate / log_circ_mv` | | `mv_adjusted_volume` | `vol / log_circ_mv` | | `mv_weighted_turnover` | `turnover_rate / log_circ_mv` | | `nonlinear_mv_volume` | `vol / log_circ_mv` | | `mv_volume_ratio` | `volume_ratio / log_circ_mv` | | `mv_momentum` | `turnover_rate * volume_ratio / log_circ_mv` | --- ## 六、资金流因子 (Money Flow Factors) ### 11. lg_flow_mom_corr_20_60 **表达式**: ``` net_lg_flow_val = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) * close rolling_net_lg_flow = sum(net_lg_flow_val, 20) price_mom = pct_change(close, 20) lg_flow_mom_corr_20_60 = corr(rolling_net_lg_flow, price_mom, 60) ``` ### 12. lg_flow_accel **表达式**: ``` net_lg_flow_vol = buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol lg_flow_accel = diff(diff(net_lg_flow_vol, 1), 1) ``` ### 13. profit_pressure **表达式**: ``` profit_margin_85 = close / cost_85pct - 1 profit_margin_95 = close / cost_95pct - 1 profit_pressure = winner_rate * 0.5 * (profit_margin_85 + profit_margin_95) ``` ### 14. underwater_resistance **表达式**: ``` underwater_ratio = 1.0 - winner_rate dist_to_cost_15 = max(0, cost_15pct - close) / (close + epsilon) underwater_resistance = underwater_ratio * dist_to_cost_15 ``` ### 15. cost_conc_std_20 **表达式**: ``` cost_range_norm = (cost_85pct - cost_15pct) / (weight_avg + epsilon) cost_conc_std_20 = std(cost_range_norm, 20) ``` ### 16. profit_decay_20 **表达式**: ``` ret_20 = close / close.shift(20) - 1 winner_rate_change_20 = diff(winner_rate, 20) profit_decay_20 = ret_20 / winner_rate_change_20 ``` ### 17. vol_amp_loss_20 **表达式**: ``` vol_20 = std(pct_chg, 20) loss_degree = max(0, weight_avg - close) / (close + epsilon) vol_amp_loss_20 = vol_20 * loss_degree ``` ### 18. vol_drop_profit_cnt_5 **表达式**: ``` is_profitable = close > weight_avg * (1 + 0.1) is_dropping = pct_chg < -0.03 rolling_mean_vol = mean(vol, 20) rolling_std_vol = std(vol, 20) is_high_vol = vol > (rolling_mean_vol + 2 * rolling_std_vol) event = is_profitable & is_dropping & is_high_vol vol_drop_profit_cnt_5 = sum(event, 5) ``` ### 19. lg_flow_vol_interact_20 **表达式**: ``` vol_20 = std(pct_chg, 20) net_lg_flow_val = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) * close total_val = vol * close abs_net_lg_flow_ratio = abs(net_lg_flow_val) / (total_val + epsilon) abs_net_lg_flow_ratio_20 = mean(abs_net_lg_flow_ratio, 20) lg_flow_vol_interact_20 = vol_20 * abs_net_lg_flow_ratio_20 ``` ### 20. cost_break_confirm_cnt_5 **表达式**: ``` prev_cost_85 = cost_85pct.shift(1) prev_cost_15 = cost_15pct.shift(1) break_up = close > prev_cost_85 break_down = close < prev_cost_15 net_lg_flow_vol = buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol confirm_up = break_up & (net_lg_flow_vol > 0) confirm_down = break_down & (net_lg_flow_vol < 0) net_confirm = confirm_up - confirm_down cost_break_confirm_cnt_5 = sum(net_confirm, 5) ``` ### 21. atr_norm_channel_pos_14 **表达式**: ``` tr = max(high - low, abs(high - prev_close), abs(low - prev_close)) atr_14 = mean(tr, 14) roll_low_14 = min(low, 14) atr_norm_channel_pos_14 = (close - roll_low_14) / atr_14 ``` ### 22. turnover_diff_skew_20 **表达式**: ``` turnover_diff = diff(turnover_rate, 1) turnover_diff_skew_20 = skew(turnover_diff, 20) ``` ### 23. lg_sm_flow_diverge_20 **表达式**: ``` lg_flow_ratio = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) / vol sm_flow_ratio = (buy_sm_vol - sell_sm_vol) / vol lg_flow_ratio_20 = mean(lg_flow_ratio, 20) sm_flow_ratio_20 = mean(sm_flow_ratio, 20) lg_sm_flow_diverge_20 = lg_flow_ratio_20 - sm_flow_ratio_20 ``` ### 24. pullback_strong_20_20 **表达式**: ``` high_20 = max(high, 20) pullback_depth = (high_20 - close) / high_20 recent_gain_20 = close / close.shift(20) - 1 pullback_strong_20_20 = pullback_depth / recent_gain_20 ``` ### 25. vol_wgt_hist_pos_20 **表达式**: ``` hist_pos = (close - his_low) / (his_high - his_low) rolling_mean_vol_20 = mean(vol, 20) vol_rel_strength = vol / rolling_mean_vol_20 vol_wgt_hist_pos_20 = hist_pos * vol_rel_strength ``` ### 26. vol_adj_roc_20 **表达式**: ``` roc_20 = close / close.shift(20) - 1 vol_20 = std(pct_chg, 20) vol_adj_roc_20 = roc_20 / vol_20 ``` --- ## 七、截面排序因子 (Cross-Sectional Rank Factors) ### 27. cs_rank_net_lg_flow_val **表达式**: ``` net_lg_flow_val = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) * close cs_rank_net_lg_flow_val = rank(net_lg_flow_val, pct=True) ``` ### 28. cs_rank_flow_divergence **表达式**: ``` lg_ratio = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) / vol sm_ratio = (buy_sm_vol - sell_sm_vol) / vol divergence = lg_ratio - sm_ratio cs_rank_flow_divergence = rank(divergence, pct=True) ``` ### 29. cs_rank_ind_adj_lg_flow **表达式**: ``` net_lg_flow_vol = (buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol) * close industry_avg_flow = mean(net_lg_flow_vol) by trade_date, cat_l2_code deviation = net_lg_flow_vol - industry_avg_flow cs_rank_ind_adj_lg_flow = rank(deviation, pct=True) ``` ### 30. cs_rank_elg_buy_ratio **表达式**: ``` elg_buy_ratio = buy_elg_vol / vol cs_rank_elg_buy_ratio = rank(elg_buy_ratio, pct=True) ``` ### 31. cs_rank_rel_profit_margin **表达式**: ``` profit_margin = (close - weight_avg) / close cs_rank_rel_profit_margin = rank(profit_margin, pct=True) ``` ### 32. cs_rank_cost_breadth **表达式**: ``` cost_breadth = (cost_85pct - cost_15pct) / weight_avg cs_rank_cost_breadth = rank(cost_breadth, pct=True) ``` ### 33. cs_rank_dist_to_upper_cost **表达式**: ``` dist_to_95 = close / cost_95pct cs_rank_dist_to_upper_cost = rank(dist_to_95, pct=True) ``` ### 34. cs_rank_winner_rate **表达式**: ``` cs_rank_winner_rate = rank(winner_rate, pct=True) ``` ### 35. cs_rank_intraday_range **表达式**: ``` norm_range = (high - low) / close cs_rank_intraday_range = rank(norm_range, pct=True) ``` ### 36. cs_rank_close_pos_in_range **表达式**: ``` close_pos = (close - low) / (high - low) cs_rank_close_pos_in_range = rank(close_pos, pct=True) ``` ### 37. cs_rank_opening_gap **表达式**: ``` gap = open / pre_close - 1 cs_rank_opening_gap = rank(gap, pct=True) ``` ### 38. cs_rank_pos_in_hist_range **表达式**: ``` hist_pos = (close - his_low) / (his_high - his_low) cs_rank_pos_in_hist_range = rank(hist_pos, pct=True) ``` ### 39. cs_rank_vol_x_profit_margin **表达式**: ``` daily_vol = abs(pct_chg) profit_margin = (close - weight_avg) / close interaction = daily_vol * profit_margin cs_rank_vol_x_profit_margin = rank(interaction, pct=True) ``` ### 40. cs_rank_lg_flow_price_concordance **表达式**: ``` net_lg_flow_vol = buy_lg_vol + buy_elg_vol - sell_lg_vol - sell_elg_vol concordance = net_lg_flow_vol * pct_chg cs_rank_lg_flow_price_concordance = rank(concordance, pct=True) ``` ### 41. cs_rank_turnover_per_winner **表达式**: ``` turnover_per_winner = turnover_rate / winner_rate cs_rank_turnover_per_winner = rank(turnover_per_winner, pct=True) ``` ### 42. cs_rank_ind_cap_neutral_pe **表达式**: `Placeholder - 需要 statsmodels 实现` ### 43. cs_rank_volume_ratio **表达式**: ``` cs_rank_volume_ratio = rank(volume_ratio, pct=True) ``` ### 44. cs_rank_elg_buy_sell_sm_ratio **表达式**: ``` ratio = buy_elg_vol / sell_sm_vol cs_rank_elg_buy_sell_sm_ratio = rank(ratio, pct=True) ``` ### 45. cs_rank_cost_dist_vol_ratio **表达式**: ``` dist = abs(close - weight_avg) / (close + epsilon) interaction = dist * volume_ratio cs_rank_cost_dist_vol_ratio = rank(interaction, pct=True) ``` ### 46. cs_rank_size **表达式**: ``` log_circ_mv = log1p(circ_mv) cs_rank_size = rank(log_circ_mv, pct=True) ``` --- ## 八、行业因子 (Industry Factors) ### 47. get_act_factor (from main.utils.factor) **生成的因子**: `act_factor1`, `act_factor2`, `act_factor3`, `act_factor4` **表达式**: ``` obv = OBV(close, vol) return_5 = close / close.shift(5) - 1 return_20 = close / close.shift(20) - 1 return_5_percentile = rank(return_5, pct=True) return_20_percentile = rank(return_20, pct=True) ``` **列重命名**: 行业因子列名前缀为 `industry_` --- ## 附录:符号说明 | 符号 | 含义 | |------|------| | `epsilon` | 极小值 (1e-10),防止除零 | | `mean(x, N)` | N周期滚动平均值 | | `std(x, N)` | N周期滚动标准差 | | `var(x, N)` | N周期滚动方差 | | `sum(x, N)` | N周期滚动求和 | | `max(x, N)` | N周期滚动最大值 | | `min(x, N)` | N周期滚动最小值 | | `diff(x, N)` | N周期差分 | | `pct_change(x, N)` | N周期百分比变化 | | `shift(x, N)` | N周期位移 | | `rank(x, pct=True)` | 截面排序 (百分比) | | `corr(x, y, N)` | N周期滚动相关系数 | | `cov(x, y, N)` | N周期滚动协方差 | | `skew(x, N)` | N周期滚动偏度 | | `kurt(x, N)` | N周期滚动峰度 | | `ATR` | Average True Range (TA-Lib) | | `OBV` | On-Balance Volume (TA-Lib) | | `RSI` | Relative Strength Index (TA-Lib) | | `SMA` | Simple Moving Average (TA-Lib) | | `EMA` | Exponential Moving Average (TA-Lib) | | `atan` | 反正切函数 | --- *文档生成时间: 2026-03-06* *共收录 180+ 个因子*