Assertion Error: Found no NVIDIA driver on your system

Assertion Error: Found no NVIDIA driver on your system  

  By: huasurong on Aug. 30, 2023, 6:01 a.m.

Dear organizer,

I submitted docker container , but it failed. I try out the algorithm and get the failed results. The problem is Assertion Error: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx。But I tested docker locally and successfully output json results。So, I'd like to know more information of error due to solve this problem.

Here are logs details bellow

 Last edited by: huasurong on Aug. 30, 2023, 7:55 a.m., edited 3 times in total.

Re: Assertion Error: Found no NVIDIA driver on your system  

  By: jmsmkn on Aug. 30, 2023, 10:10 a.m.

Your job was run with a GPU attached. Please ensure that your code can detect it, your container looks like it could be quite old as it is using Python 3.7 which is beyond its end of life. Here is the output from NVIDIA SMI that I have just run on a similar instance:

2023-08-30T10:05:58.664000+00:00 +-----------------------------------------------------------------------------+
2023-08-30T10:05:58.664000+00:00 | NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
2023-08-30T10:05:58.664000+00:00 |-------------------------------+----------------------+----------------------+
2023-08-30T10:05:58.664000+00:00 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
2023-08-30T10:05:58.664000+00:00 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
2023-08-30T10:05:58.664000+00:00 |                               |                      |               MIG M. |
2023-08-30T10:05:58.664000+00:00 |===============================+======================+======================|
2023-08-30T10:05:58.664000+00:00 |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
2023-08-30T10:05:58.664000+00:00 | N/A   39C    P8    10W /  70W |      0MiB / 15109MiB |      0%      Default |
2023-08-30T10:05:58.664000+00:00 |                               |                      |                  N/A |
2023-08-30T10:05:58.664000+00:00 +-------------------------------+----------------------+----------------------+
2023-08-30T10:05:58.664000+00:00                                                                                
2023-08-30T10:05:58.664000+00:00 +-----------------------------------------------------------------------------+
2023-08-30T10:05:58.664000+00:00 | Processes:                                                                  |
2023-08-30T10:05:58.664000+00:00 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
2023-08-30T10:05:58.664000+00:00 |        ID   ID                                                   Usage      |
2023-08-30T10:05:58.664000+00:00 |=============================================================================|
2023-08-30T10:05:58.664000+00:00 |  No running processes found                                                 |
2023-08-30T10:05:58.664000+00:00 +-----------------------------------------------------------------------------+

Re: Assertion Error: Found no NVIDIA driver on your system  

  By: huasurong on Aug. 30, 2023, 11:27 a.m.

Can you send me the log of my container running? I don't understand if the similar instance run by you(Here is the output from NVIDIA SMI that I have just run on a similar instance) is our container?

Re: Assertion Error: Found no NVIDIA driver on your system  

  By: jmsmkn on Aug. 30, 2023, 11:40 a.m.

No, it is a different container running on the same infrastructure. You need to fix the code in your own container, an NVIDIA Tesla T4 GPU with Driver Version: 470.57.02 and CUDA Version: 11.4 was attached. All of the logs for your job are available on the Job Detail page, it is what you have posted a screenshot of, there are no other logs from your container.