2015年9月22日 星期二

【MS-SQL】Batch process by Table-Valued Parameters

Entity Framework 大量 update 時使用 SaveChanges() 會是單筆一次 update 造成效能不彰,這裡參考 Yun-Long Lin 的 solution 是採用 資料表值參數 搭配 預存程式 的方式一次傳遞整個資料集,再於預存程序中進行 update,以降低 IO 需求。


資料表
  1. USE [MyLab]
  2. GO
  3.  
  4. SET ANSI_NULLS ON
  5. GO
  6.  
  7. SET QUOTED_IDENTIFIER ON
  8. GO
  9.  
  10. CREATE TABLE [dbo].[Product](
  11. [id] [uniqueidentifier] NOT NULL CONSTRAINT [DF_Product_id] DEFAULT (newid()),
  12. [name] [nvarchar](50) NOT NULL,
  13. [descrp] [nvarchar](max) NULL,
  14. [lastEditAt] [datetime] NOT NULL CONSTRAINT [DF_Product_lastEditAt] DEFAULT (getdate()),
  15. CONSTRAINT [PK_Product] PRIMARY KEY CLUSTERED
  16. (
  17. [id] ASC
  18. )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
  19. ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
  20.  
  21. GO

建使用者定義資料表類型
  1. USE [MyLab]
  2. GO
  3.  
  4. CREATE TYPE [dbo].[Product] AS TABLE(
  5. [id] [uniqueidentifier] NOT NULL,
  6. [name] [nvarchar](50) NOT NULL,
  7. [descrp] [nvarchar](max) NULL,
  8. [lastEditAt] [smalldatetime] NOT NULL
  9. )
  10. GO

預存程序(建之前 SSMS 要重連,不然似乎會抓不到上一步建的資料表類型)
  1. USE [MyLab]
  2. GO
  3.  
  4. SET ANSI_NULLS ON
  5. GO
  6.  
  7. SET QUOTED_IDENTIFIER ON
  8. GO
  9.  
  10. CREATE Procedure [dbo].[sp_BatchEditProd]
  11. (
  12. @fromClient Product readonly,
  13. @opMsg nvarchar(MAX) output
  14. )
  15. As
  16. UPDATE Product
  17. SET
  18. name = cli.name,
  19. descrp = cli.descrp,
  20. lastEditAt = GETDATE()
  21. FROM Product srv
  22. INNER JOIN @fromClient cli
  23. ON srv.id = cli.id
  24. GO

建測試資料
  1. private static string BatchInsertProd(List lstOfProd)
  2. {
  3. List lst = new List();
  4. for (int i = 0; i < 20000; i++)
  5. {
  6. Product prod = new Product();
  7. prod.id = Guid.NewGuid();
  8. prod.name = MyExts.GetRandomString(6);
  9. prod.descrp = MyExts.GetRandomString(25);
  10. prod.lastEditAt = DateTime.Now;
  11. lst.Add(prod);
  12. }
  13. db.Product.AddRange(lst);
  14. db.SaveChanges();
  15. return string.Empty;
  16. }

亂數
  1. public static string GetRandomString(int len)
  2. {
  3. var str = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
  4. var rand = new Random(Guid.NewGuid().GetHashCode());
  5. var builder = new StringBuilder();
  6. for (var i = 0; i < len; i++)
  7. {
  8. builder.Append(str[rand.Next(0, str.Length)]);
  9. }
  10. return builder.ToString();
  11. }

最後就是使用 Stopwatch 比較兩個方案的耗時

SaveChanges
  1. private static void UpdateProd(List lstOfProd)
  2. {
  3. StirData(lstOfProd);
  4. sw.Start();
  5. db.SaveChanges();
  6. sw.Stop();
  7. Console.WriteLine("UpdateProd m-sec: " + sw.ElapsedMilliseconds);
  8. }

Solution
  1. private static string BatchUpdateProd(List lstOfProd)
  2. {
  3. StirData(lstOfProd);
  4. sw.Start();
  5. string sqlCmd = "sp_BatchEditProd @fromClient, @opMsg out";
  6. string outMsg = string.Empty;
  7. var dt = lstOfProd.ToDataTable();
  8. List lstOfPars = new List() {
  9. new SqlParameter() {
  10. Direction = ParameterDirection.Input,
  11. ParameterName = "fromClient",
  12. SqlDbType = SqlDbType.Structured,
  13. TypeName = "Product",
  14. Value = dt
  15. },
  16. new SqlParameter() {
  17. Direction = ParameterDirection.Output,
  18. ParameterName = "opMsg",
  19. SqlDbType = SqlDbType.NVarChar,
  20. Size = 40,
  21. Value = null
  22. }
  23. };
  24. db.Database.ExecuteSqlCommand(sqlCmd, lstOfPars.ToArray());
  25. sw.Stop();
  26. Console.WriteLine("BatchUpdateProd m-sec: " + sw.ElapsedMilliseconds);
  27. return outMsg;
  28. }

其中,StirData() 會製造 20000筆資料異動,與重設 Stopwatch
  1. private static void StirData(List lstOfProd)
  2. {
  3. sw.Reset();
  4. for (int i = 0; i < 20000; i++)
  5. {
  6. Product prod = lstOfProd[i];
  7. prod.name = MyExts.GetRandomString(6);
  8. prod.descrp = MyExts.GetRandomString(25);
  9. prod.lastEditAt = DateTime.Now;
  10. }
  11. }

最後就是結果比較

ref: Yun-Long Lin - 使用資料表值參數進行資料的批次作業

沒有留言:

張貼留言