Enterprise Analytics Platform FAQs¶
FAQs for MI Lab¶
Q: What preparation is required for accessing data in HDFS/HIVE through notebook?¶
A: Complete the following steps before access data in HDFS/HIVE through notebook:
When requesting container resource through Resource Management, select Enable read access for HDFS and Data Warehouse.
When adding PVC through Enterprise Analytics Platform > Resource Config > Storage Config, use the requested resource with read access for HDFS and Data Warehouse (and used by MI Lab).
When creating Notebook instance, select the spark or pyspark image, and also select the Mount Hadoop PVC option.
Open the Jupyter Notebook and enter
kinit -kt /etc/security/keytab/data_xxxxx data_xxxxx@ENIOT.IO
(xxxxx is the OU ID) in the Terminal to update the ticket.
Q: How to use Python to retrieve big data amount from HIVE for model training?¶
A: If the data amount is not big, use pyhive. For big data amount, consider downloading files from HDFS to local storage. Process the data with HIVE SQL and compress it into an ORC file, and then process the ORC file with pyarrow package.
Q: How to collaborate with others within Notebook?¶
A: In different Notebook instances, mount the same PVC storage to realize collaboration and sharing.
Q: The Kernel fails to start (Error Starting Kernel) after switching the environment or performing other operations in the Notebook. How to resolve this problem?¶
A: Try running the python3 -m ipykernel install --user
command in the Terminal.
Q: Can I add a new Kernel in the Notebook?¶
A: Yes. Please refer to the following commands:
conda create -n py36-test python=3.6
source activate py36-test
conda install ipykernel
python -m ipykernel install --name py36-test
conda deactivate
Q: Why does my Notebook instance become slow after using the Notebook for some time?¶
A: When opening the notebook instance, some Kernel sessions or Terminal sessions will be created. When you close the Notebook, these sessions will not be closed for easier access when you open the Notebook again. You need to close these sessions manually if they are not needed.
Q: After installing some packages in the Notebook, some package dependency issues happen. How can I restore the notebook instance to the initial status?¶
A: In the Notebook menu, click File > Shut Down to close the notebook instance. Open the Notebook again to restore the Notebook status.
FAQs for MI Hub¶
Q: When calling model service APIs, if the request body is too big or the processing time exceeds the limit, a timeout error will be reported. How to solve this problem?¶
A: When deploying a model version, you can set the Timeout
value for the model service API (the maximum value is 600,000 ms).
Q: When using MLflow version 1.10.0, a compatibility issue may occur, which causes the model version not being successfully published. How to solve this problem?¶
A: MLflow version 1.8 is integrated in MI Hub and MI Lab by default. If you upgrade MLflow to version 1.10.0 or use a newer MLflow version in model development, you must use artifacts files of MLflow 1.8 version for publishing model versions.
Q: Model services deployed by MI Hub can be called within the cluster only. How to call the services cross clusters?¶
A: To expose model services, the services must be published through EnOS API Management, which provides authentication and traffic control service. For more information about EnOS APIM, see API Management.
Q: When calling model service APIs, what is the scope of using authentication?¶
A: When calling EAP model service APIs, try to use the authentication function in a way other than Seldon SDK. For internal calls with REST or GRPC, authentication is not required.
Q: The request time for calling EAP model service API is not stable. How to improve the stability of request time?¶
A: You can try increasing the memory request when deploying the model and test calling the model service through Postman.
FAQs for MI Pipelines¶
Q: The workflows in both Enterprise Analytics Platform and Enterprise Data Platform support scheduling. What are the differences?¶
A: The batch processing workflows of Enterprise Data Platform are for data synchronization and data processing, which supports synchronizing and processing structured data and file streams based on Data IDE, Shell, and Python. The workflows are used by data engineers.
The intelligent workflows of Enterprise Analytics Platform are for the lifecycle management of machine learning models, including data preparation, model training, model deployment, and model prediction service. The workflows are used by data scientists.
Q: When workflows are running, how to control the concurrency in the case of high concurrency?¶
A: Use the following methods to control workflow concurrency by levels:
Control the maximum number of runs at the same time of each workflow by setting the maximum concurrency number at runtime.
Control the maximum number of concurrent pods of a single workflow by setting the advanced parameter “maximum pod number”.
Control the item concurrency of the ParallelFor operator by setting the concurrency parameter of the operator.
Control the concurrency of operators by setting the “maximum pod number” parameter of the ParallelFor operator.
By setting the above 4 parameters, you can control the concurrency of a workflow from run to pod.
Q: Can data be transferred between operators of a workflow?¶
A: Yes, you can use the File operator or the Directory operator for transferring data.
Q: When an operator in a workflow runs in error, can the workflow rerun from where the error occurs?¶
A: Yes. You can click Retry on the Running Instance Detail page upon running errors. Note that rerunning the workflow is only for occasional errors. If operator parameter configuration is modified after an error occurs, rerunning will not take effect. Secondly, if the running exceeds the timeout setting of the workflow, rerunning will not take effect either.
Q: How to monitor the resource usage of a running workflow?¶
A: You can find the pod name on the Running Instance Detail page. Then, check and monitor the usage of pod running resources in Grafana by the pod name.
FAQs for Resource Management¶
Q: How does the resources that are requested through Resource Management correspond to the EAP? How to set the Request and Limit?¶
A: The resources requested corresponds to resource quota of the resource pool. The total EAP resource consumption in this resource pool including Notebook instances, model services, operators, and so on cannot exceed the quota. Requests define the minimum amount of resources the containers need, Pod scheduling is based on requests. A Pod is scheduled to run on a Node only if the Node has enough CPU resources available to satisfy the Pod requests. Limits define the maximum amount of resources the containers can consume. Setting limits can prevent a Pod from using all resources available. For more information, see Kubernetes Documentation.