1. 18 Sep, 2022 1 commit
  2. 15 Sep, 2022 1 commit
  3. 14 May, 2022 1 commit
  4. 17 Apr, 2022 1 commit
  5. 27 Mar, 2022 1 commit
  6. 31 Dec, 2021 1 commit
  7. 20 Dec, 2021 1 commit
  8. 29 Jun, 2021 1 commit
  9. 25 Jun, 2021 1 commit
    • Implement softmax kernels via warp reduce · 654febe3
      Summary:
      This commit adds extra CUDA softmax kernels using warp reduce.
      Warp reduce leads to better performance when dimension <= 256,
      which is preferred for the recent vision transformers.
      Ting PAN committed
  10. 22 Jun, 2021 1 commit
  11. 19 Jun, 2021 1 commit
  12. 08 Jun, 2021 1 commit
    • Enhance transpose operators · 936c351b
      Summary:
      This commit allows transpose to compute in-place by leveraging buffer.
      We also adds CRD mode for space-depth transpose (i.e., pixel shuffle).
      Ting PAN committed
  13. 31 May, 2021 1 commit
  14. 13 May, 2021 1 commit
  15. 07 May, 2021 1 commit
  16. 01 May, 2021 1 commit
  17. 28 Apr, 2021 1 commit
  18. 21 Apr, 2021 1 commit
    • Add GELU operator · bdf4e10f
      Summary:
      This commit adds GELU activation to compute output
      via approximate or naive mode.
      Ting PAN committed
  19. 14 Apr, 2021 1 commit
  20. 08 Apr, 2021 1 commit
    • Update with the new frontend API · f431756f
      Summary:
      The new frontend makes an union of two execution modes, while starts from
      a single tensor class. Besides, it emits the operator execution through
      a common path that works both for dragon and torch.
      Ting PAN committed
  21. 04 Feb, 2021 1 commit
  22. 25 Jan, 2021 1 commit
    • Remove support for CUDNN v6 · 73ed1b96
      Summary:
      For the purpose of consistency on getting CUDNN convolution algorithms,
      CUDNN v6 (mainly relied by CUDA 8.0) is now dropped.
      Ting PAN committed
  23. 20 Jan, 2021 1 commit
    • Add sysconfig module · bbfecf22
      Summary:
      This commit adds the sysconfig module to get the build information.
      Build information is helpful to select tests or report issues.
      Ting PAN committed
  24. 16 Jan, 2021 1 commit
  25. 29 Dec, 2020 1 commit
  26. 23 Dec, 2020 1 commit
  27. 15 Dec, 2020 1 commit
  28. 11 Dec, 2020 1 commit
  29. 10 Dec, 2020 1 commit
  30. 09 Dec, 2020 1 commit
    • Refactor ONNX frontends and backends · b93bde0d
      Summary:
      This commit redesigns the ``vm.onnx`` by referring the official repository.
      Frontends and backends are aligned with identical API for dragon, torch and tensorrt.
      Ting PAN committed
  31. 03 Dec, 2020 1 commit
  32. 02 Dec, 2020 1 commit
  33. 29 Nov, 2020 1 commit
  34. 05 Nov, 2020 1 commit
    • Use FP32 accumulator for FP16 ReduceSum · d56e67d1
      Summary:
      This commit adds a fallback with FP32 accumulator
      for FP16 ReduceSum to avoid dropping too many small values.
      Besides, FP16 kernels for arch < 530 are almost available.
      Ting PAN committed
  35. 24 Oct, 2020 1 commit
  36. 20 Oct, 2020 1 commit
  37. 14 Oct, 2020 1 commit
  38. 13 Oct, 2020 1 commit
    • Add LinSpace Operator · e83c407a
      Summary:
      This commit adds the linspace op for dragon, torch and tensorflow.
      And, a workaround for truncated int interval is made to range/linspace (up to 2**57).
      Ting PAN committed
  39. 08 Oct, 2020 1 commit
  40. 07 Oct, 2020 1 commit
    • Add Sort Operator · b4019faa
      Summary:
      This commit adds the sort op for dragon, torch and tensorflow.
      Besides, cuda implementation of topk op is now available.
      Ting PAN committed