MLPerf踩坑记

Oblivion's blog / 2024-11-09 / 原文

inference

MLPerf Steps

Install CM

python3 -m venv cm
source cm/bin/activate
pip install cm4mlops

Setup a virtual environment for Python

cm run script --tags=install,python-venv --name=mlperf
export CM_SCRIPT_EXTRA_CMD="--adr.python.name=mlperf"

Problem

Our xz installed by spack have some problem.

(cm) [rocky@scc112-cpu2 ~]$ ldd /home/rocky/spack/opt/spack/linux-rocky9-zen3/gcc-11.4.1/xz-5.4.6-54q5irsngvod5psb7bhas6tklpiztmcz/bin/xz
ldd: /home/rocky/spack/opt/spack/linux-rocky9-zen3/gcc-11.4.1/xz-5.4.6-54q5irsngvod5psb7bhas6tklpiztmcz/bin/xz: No such file or directory
(cm) [rocky@scc112-cpu2 ~]$ file /home/rocky/spack/opt/spack/linux-rocky9-zen3/gcc-11.4.1/xz-5.4.6-54q5irsngvod5psb7bhas6tklpiztmcz/bin/xz
/home/rocky/spack/opt/spack/linux-rocky9-zen3/gcc-11.4.1/xz-5.4.6-54q5irsngvod5psb7bhas6tklpiztmcz/bin/xz: cannot open /home/rocky/spa

So we use yum to install and change the PATH.

yum install xz
export PATH=/usr/bin:$PATH

The script to generate actual submission tree check the test_query_count no less than 10833, so we change it in the script.

Optimize

performance run:

taskset -c 0-31 cm run script --tags=run-mlperf,inference,_find-performance,_full,_r4.1-dev \
   --model=bert-99 \
   --implementation=reference \
   --framework=deepsparse \
   --category=edge \
   --scenario=Offline \
   --execution_mode=test \
   --device=cpu  \
   --quiet \
   --test_query_count=60833\
   --env.CM_MLPERF_NEURALMAGIC_MODEL_ZOO_STUB=zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/base_quant-none \
   --batch_size=64 \
   --env.OMP_NUM_THREADS=32

accuracy run:

taskset -c 0-31 cm run script --tags=run-mlperf,inference,_r4.1-dev \
   --model=bert-99 \
   --implementation=reference \
   --framework=deepsparse \
   --category=edge \
   --scenario=Offline \
   --execution_mode=valid \
   --device=cpu \
   --quiet \
   --env.CM_MLPERF_NEURALMAGIC_MODEL_ZOO_STUB=zoo:nlp/question_answering/mobilebert-none/pytorch/huggingface/squad/base_quant-none \
   --batch_size=64 \
   --env.OMP_NUM_THREADS=32 \
   --test_query_count=10833 

We use taskset -c 0-31 to bind the process to CPU cores 0 to 31 to avoid performance loss due to switching between different cores.

We choice deepsparse as the framework since it has higher performance.

We try different batch_size and finally choice 64 to get the highest performance.

We choice 32 OMP_NUM_THREADS since our machine has 32 cores and Thread(s) per core is 1.

Submit

We change env.CM_FRAMEWORK as deepsparse.

cm run script --tags=generate,inference,submission \
   --clean \
   --preprocess_submission=yes \
   --run-checker \
   --tar=yes \
   --env.CM_TAR_OUTFILE=submission.tar.gz \
   --division=open \
   --category=edge \
   --env.CM_DETERMINE_MEMORY_CONFIGURATION=yes \
   --run_style=valid \
   --quiet \
   --submitter=scc112 \
   --env.CM_FRAMEWORK=deepsparse \
   --hw_name="scc112-cpu2"