基础性能优化题：这段代码的解压缩性能可以提高吗？

我是路标

# 0x01 由来
# 0x02 测试方法
# 0x02 可优化点
# 0x03 优化一下试试
# 0x04 AI能解决这个问题吗？

# 0x01 由来

今天下午六点左右的时候，群里面突然有一个人艾特了我，问我解压跟内存关系大不大。

问了下，他说的压缩是GZIPStream压缩。

通常来说，解压缩更关心的是CPU速度，而并非首先关心内存大小。内存大小通常和压缩所使用的字典大小等有关系，相对来说CPU的速度会更加敏感。

然后他说……

嗯……这就有点好玩了。Talking is cheap, show me the code。

然后他贴出来了这样的代码：

/// <summary>
/// 多文件压缩解压
/// </summary>
/// <param name="zipPath">压缩文件路径</param>
/// <param name="targetPath">解压目录</param>
public void DeCompressMulti(string zipPath, string targetPath)
{
  byte[] fileSize = new byte[4];
  if (File.Exists(zipPath))
  {
    using (FileStream fStream = File.Open(zipPath, FileMode.Open))
    {
      using (MemoryStream ms = new MemoryStream())
      {
        using (GZipStream zipStream = new GZipStream(fStream, CompressionMode.Decompress))
        {
          zipStream.CopyTo(ms);
        }
        ms.Position = 0;
        while (ms.Position != ms.Length)
        {
          ms.Read(fileSize, 0, fileSize.Length);
          int fileNameLength = BitConverter.ToInt32(fileSize, 0);
          byte[] fileNameBytes = new byte[fileNameLength];
          ms.Read(fileNameBytes, 0, fileNameBytes.Length);
          string fileName = System.Text.Encoding.UTF8.GetString(fileNameBytes);
          string fileFulleName = targetPath + fileName;
          ms.Read(fileSize, 0, 4);
          int fileContentLength = BitConverter.ToInt32(fileSize, 0);
          byte[] fileContentBytes = new byte[fileContentLength];
          ms.Read(fileContentBytes, 0, fileContentBytes.Length);
          using (FileStream childFileStream = File.Create(fileFulleName))
          {
            childFileStream.Write(fileContentBytes, 0, fileContentBytes.Length);
          }
        }
      }
    }
  }
}

嗯……事情开始有意思起来了，此时我决定来水一篇博客。

有兴趣的同学可以这里暂停一下，看看上面的代码，都能看出来哪些地方存在可改进的地方

# 0x02 测试方法

看到这个代码，简单看了几眼后，我对他的测试方法先好奇了起来。

群里的LSP说，估计是虚拟机……吧。

嗯，用虚拟机测试比较靠谱，但感觉不太会出现如此大的性能差异。

后来他回复了，不同的电脑。

那这样的话，其实测试没啥意义，因为条件都不一样。比较简单而科学的方法确实是虚拟机，比较一个条件差异的时候，其它条件肯定要固化嘛。你让刘翔金鸡独立地跟我比赛跑步，跑不过我，你能因此得出结论我跑步比刘翔快嘛，很明显不行对吧。

那如果我们抛开这个性能测试不谈呢？这个代码有没有可优化的空间？

# 0x02 可优化点

那必然是有的。

这里要进入代码分析了，还想自己琢磨代码的可以拖回到前面去看哦。

考虑到这里的代码确实不复杂，那么……直接上注释版本。

/// <summary>
/// 多文件压缩解压
/// </summary>
/// <param name="zipPath">压缩文件路径</param>
/// <param name="targetPath">解压目录</param>
public void DeCompressMulti(string zipPath, string targetPath)
{
  byte[] fileSize = new byte[4];
  
  // 【代码风格，与性能无关】这里可以反转判断条件，提前退出，有利于减少代码缩进级数
  if (File.Exists(zipPath))
  {
    // 【代码风格，与性能无关】可以考虑使用并列using，有利于减少代码缩进级数
    using (FileStream fStream = File.Open(zipPath, FileMode.Open))
    {
      /*
      【性能问题1】
      使用MemoryStream作为中间的流来储存解压后的数据，可以这么做，
      那么在这里我们需要思考的是：必须要用这个MemoryStream吗？
      */
      using (MemoryStream ms = new MemoryStream())
      {
        using (GZipStream zipStream = new GZipStream(fStream, CompressionMode.Decompress))
        {
          zipStream.CopyTo(ms);
        }
        ms.Position = 0;
        while (ms.Position != ms.Length)
        {
          // 【代码风格，与性能无关】从上下文可以看出，这里读出的其实是文件名长度
          // 因此变量名可能导致理解上的混淆
          ms.Read(fileSize, 0, fileSize.Length);
          int fileNameLength = BitConverter.ToInt32(fileSize, 0);
          byte[] fileNameBytes = new byte[fileNameLength];
          ms.Read(fileNameBytes, 0, fileNameBytes.Length);
          string fileName = System.Text.Encoding.UTF8.GetString(fileNameBytes);
          string fileFulleName = targetPath + fileName;
          ms.Read(fileSize, 0, 4);
          int fileContentLength = BitConverter.ToInt32(fileSize, 0);
          /*
          【性能问题2】
          在文件数据写入文件前，我们又声明了一个临时的字节数组，将数据全部从MemoryStream中复制出来。
          这里是第二个问题：我们一定要这样做吗，这样做会有什么问题？
          */
          byte[] fileContentBytes = new byte[fileContentLength];
          ms.Read(fileContentBytes, 0, fileContentBytes.Length);
          using (FileStream childFileStream = File.Create(fileFulleName))
          {
            childFileStream.Write(fileContentBytes, 0, fileContentBytes.Length);
          }
        }
      }
    }
  }
}

注释中提到的代码风格问题，我们暂且不讨论。我们先回过头看他的测试结论，为什么2G需要16秒，3G只需要6秒，而16G只需要1秒？

那在整个过程中，可能会有哪些情况引起性能差异呢？可能有以下三点：

磁盘IO速度
内存读写速度
CPU速度

基于以上三个情况，在特定的环境下，往往无法修改。那么从代码本身来看，很明显有一个巨大的问题，那就是内存占用方面过于奔放。而他所测到的结论，不能说全是这个内存占用的原因，但内存占用至少是原因之一（但其实主要还是CPU差异）。

我们看上面注释的性能问题1和性能问题2，相当于每次处理一个压缩包文件，那么内存中至少会有两倍于这个文件大小的空间被占用和释放。

那么在这个过程中，会引起大量的内存碎片，尤其是当MemoryStream没有指定初始大小而不断扩容的时候。

因此，上面的代码，至少两倍于文件体积的数据被读取和复制、且会占用至少两倍于文件体积的内存。这在一些小内存的设备（比如只有1G、2G内存）的设备上，影响更为明显。

以一个163MB的文件为例，上述代码执行需要耗费0.61秒以及457MB的内存。随着目标文件增大、以及一次性解压缩大量文件，由此带来的内存占用以及GC压力，对性能的影响是致命的。当设备本身性能不足时，这些会导致速度的表现雪上加霜。

性能从来不是只有速度，还有稳定性和资源使用率。

# 0x03 优化一下试试

可以优化吗？当然可以。但是在很多情况下（除非是资源特别小如内存特别小、文件不多等）可能速度表现不明显，因为大多数的时间消耗其实是解压缩和文件IO。

但内存占用我们是完全可以消灭的。

怎么消灭呢？很简单：避免无用的数据复制。

以上述代码为例，GZipStream的数据完全可以用流式的方式进行读取操作，不需要先复制到MemoryStream中；写入目标文件也是如此。然而在我们实际日常看到的代码中，这么一口气读出来再写入的例子比比皆是。

在针对小数据的情况下这样操作往往比较简单且副作用不怎么大，但针对大型数据以及大型文件，这样操作往往需要特别谨慎。试想下你尝试解压2GB的文件

依据此思路，我们可以写出以下代码：

public void DeCompressNew(string zipPath, string targetPath)
{
	if (!File.Exists(zipPath))
		return;

	using var fzip = File.OpenRead(zipPath);
	using var gzip = new GZipStream(fzip, CompressionMode.Decompress);

	var buffer = new byte[4];
	// 读取数据。如果required为true，则要求必须读取指定长度（不满足则抛出异常）
	int ReadBuffer(byte[] buf, bool required = true)
	{
		var count = gzip.Read(buf, 0, buf.Length);
		if (required && count != buf.Length) throw new EndOfStreamException();
		return count;
	}

	// 当读取不到时就退出循环。解压缩流长度往往未知，无法用位置来判断。
	// 这可能是之前代码用MemoryStream一次性读取的原因？
	while (ReadBuffer(buffer, false) == 4)
	{
		var nameBuffer = new byte[BitConverter.ToInt32(buffer)];
		ReadBuffer(nameBuffer);
		var fileName = Encoding.UTF8.GetString(nameBuffer, 0, nameBuffer.Length);

		ReadBuffer(buffer);
		var fileSize = BitConverter.ToInt32(buffer, 0);

		using (var fout = File.Create(Path.Combine(targetPath, fileName)))
		{
			// 使用流式复制，避免大量内存占用
			var readBuffer = new byte[0x400 * 16];   //64K缓冲区
			var remainsCount = fileSize;

			while (remainsCount > 0)
			{
				var sizeToRead = Math.Min(remainsCount, readBuffer.Length);
				var readCount = gzip.Read(readBuffer, 0, sizeToRead);
				fout.Write(readBuffer, 0, readCount);
				remainsCount -= readCount;
			}
		}
	}
}

注意：以上代码没有经过完整测试，可能存在BUG，概不负责

以上代码在同一台机器上解压同样的文件，执行时间为0.57秒，占用内存5.77MB。

在单次测试中似乎性能没有比之前的高多少，但内存占用只有之前版本的零头不到（只占到1.3%）。

需要注意的一点是，这个内存占用和速度相比之前的版本，是很稳定的，它不会因为目标文件的大小多少而变化。

而之前的版本则随着文件大小不一及文件多少不一，性能非常不稳定，内存消耗也会差异极大。

# 0x04 AI能解决这个问题吗？

群里的幽默老师把原版的代码投喂给了ChatGPT，让他优化一下代码。

然后ChatGPT给出了这样的结果。

/// <summary>
/// 多文件压缩解压
/// </summary>
/// <param name="zipPath">压缩文件路径</param>
/// <param name="targetPath">解压目录</param>
public async Task DeCompressMultiAsync(string zipPath, string targetPath)
{
    byte[] fileSize = new byte[4];
    if (!File.Exists(zipPath))
    {
        throw new FileNotFoundException("The specified zip file does not exist.");
    }

    if (!Directory.Exists(targetPath))
    {
        Directory.CreateDirectory(targetPath);
    }

    using FileStream fStream = File.Open(zipPath, FileMode.Open);
    using MemoryStream ms = new MemoryStream();
    using (GZipStream zipStream = new GZipStream(fStream, CompressionMode.Decompress))
    {
        await zipStream.CopyToAsync(ms);
    }

    ms.Position = 0;
    while (ms.Position != ms.Length)
    {
        await ms.ReadAsync(fileSize, 0, fileSize.Length);
        int fileNameLength = BitConverter.ToInt32(fileSize, 0);
        byte[] fileNameBytes = new byte[fileNameLength];
        await ms.ReadAsync(fileNameBytes, 0, fileNameBytes.Length);
        string fileName = Encoding.UTF8.GetString(fileNameBytes);
        string fileFulleName = Path.Combine(targetPath, fileName);
        await ms.ReadAsync(fileSize, 0, 4);
        int fileContentLength = BitConverter.ToInt32(fileSize, 0);
        byte[] fileContentBytes = new byte[fileContentLength];
        await ms.ReadAsync(fileContentBytes, 0, fileContentBytes.Length);
        await using FileStream childFileStream = File.Create(fileFulleName);
        await childFileStream.WriteAsync(fileContentBytes, 0, fileContentBytes.Length);
    }
}

嗯……确实完善了不少。有兴趣的话你瞅瞅看

说到底嘛，还是写代码的时候，其实对于代码实际运行所面对的数据和场景要有规划，这样才能有合适的代码逻辑。

感谢阅读本文，欢迎扫描下方二维码关注鱼的公众号（微信内长按识别哦）

一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

魚·后花园魚·藏匿的花园