1. 14 May, 2022 1 commit
  2. 17 Apr, 2022 1 commit
  3. 27 Mar, 2022 1 commit
  4. 31 Dec, 2021 1 commit
  5. 20 Dec, 2021 1 commit
  6. 29 Jun, 2021 1 commit
  7. 25 Jun, 2021 1 commit
    • Implement softmax kernels via warp reduce · 654febe3
      Summary:
      This commit adds extra CUDA softmax kernels using warp reduce.
      Warp reduce leads to better performance when dimension <= 256,
      which is preferred for the recent vision transformers.
      Ting PAN committed
  8. 22 Jun, 2021 1 commit
  9. 19 Jun, 2021 1 commit
  10. 08 Jun, 2021 1 commit
    • Enhance transpose operators · 936c351b
      Summary:
      This commit allows transpose to compute in-place by leveraging buffer.
      We also adds CRD mode for space-depth transpose (i.e., pixel shuffle).
      Ting PAN committed
  11. 31 May, 2021 1 commit
  12. 13 May, 2021 1 commit
  13. 07 May, 2021 1 commit
  14. 01 May, 2021 1 commit
  15. 28 Apr, 2021 1 commit
  16. 21 Apr, 2021 1 commit
    • Add GELU operator · bdf4e10f
      Summary:
      This commit adds GELU activation to compute output
      via approximate or naive mode.
      Ting PAN committed
  17. 14 Apr, 2021 1 commit
  18. 08 Apr, 2021 1 commit
    • Update with the new frontend API · f431756f
      Summary:
      The new frontend makes an union of two execution modes, while starts from
      a single tensor class. Besides, it emits the operator execution through
      a common path that works both for dragon and torch.
      Ting PAN committed
  19. 04 Feb, 2021 1 commit
  20. 25 Jan, 2021 1 commit
  21. 20 Jan, 2021 1 commit
    • Add sysconfig module · bbfecf22
      Summary:
      This commit adds the sysconfig module to get the build information.
      Build information is helpful to select tests or report issues.
      Ting PAN committed
  22. 16 Jan, 2021 1 commit
  23. 29 Dec, 2020 1 commit
  24. 23 Dec, 2020 1 commit
  25. 15 Dec, 2020 1 commit
  26. 11 Dec, 2020 1 commit
  27. 10 Dec, 2020 1 commit
  28. 09 Dec, 2020 1 commit
    • Refactor ONNX frontends and backends · b93bde0d
      Summary:
      This commit redesigns the ``vm.onnx`` by referring the official repository.
      Frontends and backends are aligned with identical API for dragon, torch and tensorrt.
      Ting PAN committed
  29. 03 Dec, 2020 1 commit
  30. 02 Dec, 2020 1 commit
  31. 29 Nov, 2020 1 commit
  32. 05 Nov, 2020 1 commit
    • Use FP32 accumulator for FP16 ReduceSum · d56e67d1
      Summary:
      This commit adds a fallback with FP32 accumulator
      for FP16 ReduceSum to avoid dropping too many small values.
      Besides, FP16 kernels for arch < 530 are almost available.
      Ting PAN committed
  33. 24 Oct, 2020 1 commit
  34. 20 Oct, 2020 1 commit
  35. 14 Oct, 2020 1 commit
  36. 13 Oct, 2020 1 commit
    • Add LinSpace Operator · e83c407a
      Summary:
      This commit adds the linspace op for dragon, torch and tensorflow.
      And, a workaround for truncated int interval is made to range/linspace (up to 2**57).
      Ting PAN committed
  37. 08 Oct, 2020 1 commit
  38. 07 Oct, 2020 1 commit
    • Add Sort Operator · b4019faa
      Summary:
      This commit adds the sort op for dragon, torch and tensorflow.
      Besides, cuda implementation of topk op is now available.
      Ting PAN committed
  39. 27 Sep, 2020 1 commit
    • Use local workspace for Context · fdf26ef2
      Summary:
      This commit uses local(thread or stream) workspace for Context,
      which provides a more elegant way to dispatch kernels requiring scratch.
      Besides, TF32 math type is provided as a cuDNN option for Ampere device.
      Ting PAN committed
  40. 10 Sep, 2020 1 commit
    • Add Unique Operator · 1dd8aeef
      Summary:
      This commit adds the unique op for dragon, torch, tensorflow and onnx.
      Besides, fixes the bug that gets the wrong workspace size in cached cudnn convolution.
      Ting PAN committed