源码分析 Mybatis 的 foreach 为什么会出现性能问题 | 好好学java | Spring Boot | Spring Cloud | 微服务 | Java技术 | Java面试 | 分布式

mybatis,mybatis源码

mybatis源码

发布日期: 2021-04-05

作者: 欧阳思海

文章字数: 2.6k

阅读时长: 11 分

阅读次数:

本人花费半年的时间总结的《Java面试指南》已拿腾讯等大厂offer，已开源在github ，欢迎star！

本文GitHub https://github.com/OUYANGSIHAI/JavaInterview 已收录，这是我花了6个月总结的一线大厂Java面试总结，本人已拿大厂offer，欢迎star

原文链接：blog.ouyangsihai.cn >> 源码分析 Mybatis 的 foreach 为什么会出现性能问题

作者：TSMYK my.oschina.net/mengyuankan/blog/2989154

my.oschina.net/mengyuankan/blog/2989154

背景

最近在做一个类似于综合报表之类的东西，需要查询所有的记录（数据库记录有限制），大概有1W条记录。该报表需要三个表的数据，也就是根据这 1W 个 ID 去执行查询三次数据库，其中，有一条查询 SQL 是自己写，其他两条是根据别人提供的接口进行查询。

刚开始的时候，没有多想，直接使用 in 进行查询，使用 Mybatis 的 foreach 语句；项目中使用的是 jsonrpc 来请求数据，在测试的时候，发现老是请求不到数据，日志抛出的是 jsonrpc 超时异常，继续查看日志发现，是被阻塞在上面的三条SQL查询中。

在以前分析 Mybatis 的源码的时候，了解到，Mybatis 的 foreach 会有性能问题，所以改了下 SQL，直接在代码中拼接SQL，然后在 Mybatis 中直接使用 # 来获取，替换 class 测试了下，果然一下子就能查询出数据。

前提

这里先不考虑使用 in 好不好，如何去优化 in，如何使用 exists 或 inner join 进行代替等，这里就只是考虑使用了 in 语句，且使用了 Mybatis 的 foreach 语句进行优化，其实 foreach 的优化很简单，就是把 in 后面的语句在代码里面拼接好，在配置文件中直接通过 #{xxx} 或 ${xxx} 当作字符串直接使用即可。

测试

在分析 foreach 源码之前，先构造个数据来看看它们的区别有多大。

建表语句：


CREATE TABLE person
(
    id int(11) PRIMARY KEY NOT NULL,
    name varchar(50),
    age int(11),
    job varchar(50)
);

插入 1W 条数据：

POJO 类：


@Getter
@Setter
@ToString
@NoArgsConstructor
@AllArgsConstructor
public class Person implements Serializable {
    private int id;
    private String name;
    private String job;
    private int age;
}

方式一

通过原始的方式，使用 foreach 语句：

1. 在 dao 里面定义方法：


ListPerson queryPersonByIds(@Param("ids") ListInteger ids);

2. 配置文件SQL：


select id="queryPersonByIds" parameterType="list" resultMap="queryPersonMap"
    select * from person where 1=1
    if test="ids != null and ids.size()  0"
        and id in
        foreach collection="ids" item="item" index="index" separator="," open="(" close=")"
            #{item}
        /foreach
    /if
/select

3. 执行 main 方法：


@RunWith(SpringJUnit4ClassRunner.class)
@ContextConfiguration(locations = { "classpath:spring-mybatis.xml" })
public class MainTest {

    @Autowired
    private IPersonService personService;

    @Test
    public void test(){
        // 构造 1W 个 ID
        ListInteger ids = new ArrayList();
        for (int i = 1; i = 10000; i++) {
            ids.add(i);
        }
        long start = System.currentTimeMillis();

        // 执行三次
        personService.queryPersonByIds(ids);
        personService.queryPersonByIds(ids);
        personService.queryPersonByIds(ids);

        long end = System.currentTimeMillis();
        System.out.println(String.format("耗时：%d", end - start));
    }
}
结果：耗时：2853

可以看到通过 foreach 的方法，大概需要 3s

方式二

在代码中封装 SQL ，在配置文件中通过 ${xxx} 来获取：

1. 在 dao 添加方法：


ListPerson queryPersonByIds2(@Param("ids") String ids);

2. 配置文件SQL：


select id="queryPersonByIds2" parameterType="String" resultMap="queryPersonMap"
    select * from person where 1=1
    if test="ids != null and ids != ''"
      and id in ${ids}
    /if
/select

3. 执行 main 方法：


@Test
public void test_3(){
    // 拼接 SQL 
    StringBuffer sb = new StringBuffer();
    sb.append("(");
    for (int i = 1; i  10000; i++) {
        sb.append(i).append(",");
    }
    sb.deleteCharAt(sb.toString().length() - 1);
    sb.append(")");
    // 最终的 SQL 为 (1,2,3,4,5...)

    long start2 = System.currentTimeMillis();

    // 执行三次
    personService.queryPersonByIds2(sb.toString());
    personService.queryPersonByIds2(sb.toString());
    personService.queryPersonByIds2(sb.toString());

    long end2 = System.currentTimeMillis();
    System.out.println(String.format("耗时：%d", end2 - start2));
}
结果：耗时：360

通过拼接 SQL，使用 ${xxx} 的方式，执行同样的 SQL ，耗时大概 360 ms

方式三

在代码中封装 SQL ，在配置文件中通过 #{xxx} 来获取：

1. 在 dao 中添加方法：


ListPerson queryPersonByIds3(@Param("ids") String ids);

2. 配置文件SQL：


select id="queryPersonByIds3" parameterType="String" resultMap="queryPersonMap"
    select * from person where 1=1
    if test="ids != null and ids != ''"
        and id in (#{ids})
    /if
/select

3. 执行 main 方法：


@Test
public void test_3(){
    // 拼接 SQL
    StringBuffer sb2 = new StringBuffer();
    for (int i = 1; i  10000; i++) {
        sb2.append(i).append(",");
    }
    sb2.deleteCharAt(sb2.toString().length() - 1);
    // 最终的SQL为 1,2,3,4,5....

    long start3 = System.currentTimeMillis();

    personService.queryPersonByIds3(sb2.toString());
    personService.queryPersonByIds3(sb2.toString());
    personService.queryPersonByIds3(sb2.toString());

    long end3 = System.currentTimeMillis();
    System.out.println(String.format("耗时：%d", end3 - start3));
}
结果：耗时：30

通过拼接 SQL，使用 #{xxx} 的方式，执行同样的 SQL ，耗时大概 30 ms

总结

通过上面三种方式可以看到，使用不同的方式，耗时的差别还是麻大的，最快的是拼接 SQL，使用 #{xxx}当作字符串处理，最慢的是 foreach。

为什么 foreach 会慢那么多呢，后面再分析源码的时候再进行分析；而这里同样是拼接 SQL 的方式，#{xxx} 和 ${xxx} 耗时却相差 10 倍左右；我们知道，Mybatis 在解析 # 和 $ 这两种不同的符号时，采用不同的处理策略；

使用过 JDBC的都知道，通过 JDBC 执行 SQL 有两种方式： Statment 对象和PreparedStatment 对象， PreparedStatment 表示预编译的SQL，包含的SQL已经预编译过了，SQL 中的参数部分使用 ？进行占位，之后使用 setXXX 进行赋值，当使用 Statement 对象时，每次执行一个SQL命令时，都会对它进行解析和编译。所有 PreparedStatment 效率要高一些。那么 Mybatis 在解析 # 和 $ 的时候，分别对应的是这两种对象，# 被解析成 PreparedStatment 对象，通过 ? 进行占位，之后再赋值，而 $ 被解析成 Statement ，通过直接拼接SQL的方式赋值，所以，为什么同样是通过在代码中拼接 SQL ，# 和 $的耗时不同的原因。

PS：上面只是介绍了三种方式，应该没有人问，拼接SQL为 (1,2,3,4,5)，在配置SQL中通过 #{xxx} 来获取吧

foreach 源码解析

下面来看下 foreach 是如何被解析的，最终解析的 SQL 是什么样的：

在 Mybatis 中，foreach属于动态标签的一种，也是最智能的其中一种，Mybatis 每个动态标签都有对应的类来进行解析，而 foreach 主要是由 ForEachSqlNode 负责解析。

ForeachSqlNode 主要是用来解析 foreach 节点的，先来看看 foreach 节点的用法：


select id="queryPersonByIds" parameterType="list" resultMap="queryPersonMap"
    select * from person where 1=1
    if test="ids != null and ids.size()  0"
        and id in
        foreach collection="ids" item="item" index="index" separator="," open="(" close=")"
            #{item}
        /foreach
    /if
/select

最终被数据库执行的 SQL 为：


select  * from person where 1=1 and id in (1,2,3,4,5)

先来看看它的两个内部类：

PrefixedContext

该类主要是用来处理前缀，比如 "(" 等。


private class PrefixedContext extends DynamicContext {   
   private DynamicContext delegate;
    // 指定的前缀
    private String prefix;
    // 是否处理过前缀
    private boolean prefixApplied;
    // .......

    @Override
    public void appendSql(String sql) {
      // 如果还没有处理前缀，则添加前缀
      if (!prefixApplied && sql != null && sql.trim().length()  0) {
        delegate.appendSql(prefix);
        prefixApplied = true;
      }
       // 拼接SQL
      delegate.appendSql(sql);
    }
}

FilteredDynamicContext

FilteredDynamicContext 是用来处理 #{} 占位符的，但是并未绑定参数，只是把 #{item} 转换为 #{_frch_item_1} 之类的占位符。


private static class FilteredDynamicContext extends DynamicContext {
    private DynamicContext delegate;
    //对应集合项在集合的索引位置
    private int index;
    // item的索引
    private String itemIndex;
    // item的值
    private String item;
    //.............
    // 解析 #{item}
    @Override
    public void appendSql(String sql) {
      GenericTokenParser parser = new GenericTokenParser("#{", "}", new TokenHandler() {
        @Override
        public String handleToken(String content) {
          // 把 #{itm} 转换为 #{__frch_item_1} 之类的
          String newContent = content.replaceFirst("^\s*" + item + "(?![^.,:\s])", itemizeItem(item, index));
           // 把 #{itmIndex} 转换为 #{__frch_itemIndex_1} 之类的
          if (itemIndex != null && newContent.equals(content)) {
            newContent = content.replaceFirst("^\s*" + itemIndex + "(?![^.,:\s])", itemizeItem(itemIndex, index));
          }
          // 再返回 #{__frch_item_1} 或 #{__frch_itemIndex_1}
          return new StringBuilder("#{").append(newContent).append("}").toString();
        }
      });
      // 拼接SQL
      delegate.appendSql(parser.parse(sql));
    }
  private static String itemizeItem(String item, int i) {
    return new StringBuilder("__frch_").append(item).append("_").append(i).toString();
  }
}

ForeachSqlNode

了解了 ForeachSqlNode 它的两个内部类之后，再来看看它的实现：


public class ForEachSqlNode implements SqlNode {
  public static final String ITEM_PREFIX = "__frch_";
  // 判断循环的终止条件
  private ExpressionEvaluator evaluator;
  // 循环的集合
  private String collectionExpression;
  // 子节点
  private SqlNode contents;
  // 开始字符
  private String open;
  // 结束字符
  private String close;
  // 分隔符
  private String separator;
  // 本次循环的元素，如果集合为 map，则index 为key，item为value
  private String item;
  // 本次循环的次数
  private String index;
  private Configuration configuration;

  // ...............

  @Override
  public boolean apply(DynamicContext context) {
    // 获取参数
    MapString, Object bindings = context.getBindings();
    final Iterable? iterable = evaluator.evaluateIterable(collectionExpression, bindings);
    if (!iterable.iterator().hasNext()) {
      return true;
    }
    boolean first = true;
    // 添加开始字符串
    applyOpen(context);
    int i = 0;
    for (Object o : iterable) {
      DynamicContext oldContext = context;
      if (first) {
        // 如果是集合的第一项，则前缀prefix为空字符串
        context = new PrefixedContext(context, "");
      } else if (separator != null) {
        // 如果分隔符不为空，则指定分隔符
        context = new PrefixedContext(context, separator);
      } else {
          // 不指定分隔符，在默认为空
          context = new PrefixedContext(context, "");
      }
      int uniqueNumber = context.getUniqueNumber();  
      if (o instanceof Map.Entry) {
        // 如果集合是map类型，则将集合中的key和value添加到bindings参数集合中保存
        Map.EntryObject, Object mapEntry = (Map.EntryObject, Object) o;
        // 所以循环的集合为map类型，则index为key，item为value，就是在这里设置的
        applyIndex(context, mapEntry.getKey(), uniqueNumber);
        applyItem(context, mapEntry.getValue(), uniqueNumber);
      } else {
        // 不是map类型，则将集合中元素的索引和元素添加到 bindings集合中
        applyIndex(context, i, uniqueNumber);
        applyItem(context, o, uniqueNumber);
      }
      // 调用 FilteredDynamicContext 的apply方法进行处理
      contents.apply(new FilteredDynamicContext(configuration, context, index, item, uniqueNumber));
      if (first) {
        first = !((PrefixedContext) context).isPrefixApplied();
      }
      context = oldContext;
      i++;
    }
     // 添加结束字符串
    applyClose(context);
    return true;
  }

  private void applyIndex(DynamicContext context, Object o, int i) {
    if (index != null) {
      context.bind(index, o); // key为idnex，value为集合元素
      context.bind(itemizeItem(index, i), o); // 为index添加前缀和后缀形成新的key
    }
  }

  private void applyItem(DynamicContext context, Object o, int i) {
    if (item != null) {
      context.bind(item, o);
      context.bind(itemizeItem(item, i), o);
    }
  }
}

所以该例子：


select id="queryPersonByIds" parameterType="list" resultMap="queryPersonMap"
    select * from person where 1=1
    if test="ids != null and ids.size()  0"
        and id in
        foreach collection="ids" item="item" index="index" separator="," open="(" close=")"
            #{item}
        /foreach
    /if
/select

解析之后的 SQL 为：


select  *  from  person where  1=1 and id in (#{__frch_item_0},  #{__frch_item_1}, #{__frch_item_2}, #{__frch_item_3}, #{__frch_item_4})

之后在通过 PreparedStatment 的 setXXX来进行赋值。

所以，到这里，知道了 Mybatis 在解析 foreach 的时候，最后还是解析成了 # 的方式，但是为什么还是很慢呢，这是因为需要循环解析 #{__frch_item_0} 之类的占位符，foreach 的集合越大，解析越慢。既然知道了需要解析占位符，为何不自己拼接呢，所以就可以在代码中拼接好，而不再使用 foreach 啦。

所以，Mybatis 在解析 foreach 的时候，底层还是会解析成 # 号的形式而不是 $ 的形式，既然知道了这个，如果需要 foreach 的集合很大，就可以使用代码拼接 SQL ，使用 (#{xxx}) 的方式进行获取，不要再拼接成 (1,2,3,4,5) 再使用 ${xxx} 的方式啦。

如果你觉得文章不错，欢迎点赞分享到朋友圈

原文始发于微信公众号（后端技术精选）：

本人花费半年的时间总结的《Java面试指南》已拿腾讯等大厂offer，已开源在github ，欢迎star！

本文GitHub https://github.com/OUYANGSIHAI/JavaInterview 已收录，这是我花了6个月总结的一线大厂Java面试总结，本人已拿大厂offer，欢迎star

原文链接：blog.ouyangsihai.cn >> 源码分析 Mybatis 的 foreach 为什么会出现性能问题