File Operators


The MI Pipelines provides the following file operators based on Git and HDFS, which can be used to get files or directories:

  • Git Directory Operator

  • Git File Operator

  • HDFS Directory Operator

  • HDFS File Operator

Git Directory Operator

The Git Directory operator is used to get all the files in the directory from the Git directory. It is often used as a pre-operator for Shell, Python, Notebook and other operators to provide the required code files. For example:

../_images/git_dir_calculator.png

Input Parameters Description

Name

Required/optional

Type

Description

data_source_name

Required

String

Data source name from the data source connection configuration.

project

Required

String

Git project name.

branch

Required

String

Git branch name.

paths

Required

List

File path list (in list format), where the list element may be a file or path. For example: [“modelhosting_prj/model6/test1.py”].

Output parameters description

Name

Type

Description

workspace

Directory

Directory where the file is located (minio), which is of directory type, and is used to output the directories and files in paths in the form of workspace.

paths

List

File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing.

Git File Operator

The Git File operator is used to get a specified single file from the Git warehouse for the input of other operators.

Input Parameters Description

Name

Required/optional

Type

Description

data_source_name

Required

String

Data source name from the data source connection configuration.

project

Required

String

Git project name.

branch

Required

String

Git branch name.

file_path

Required

String

File path.

Output parameters description

Name

Type

Description

file

File

Output a single file pulled from Git.

HDFS Directory Operator

The HDFS Directory operator is used to get one or more files in a specified directory from HDFS.

Input Parameters Description

Name

Required/optional

Type

Description

data_source_name

Required

String

Data source name from the data source connection configuration.

file_paths

Required

List

HDFS file path list.

Output parameters description

Name

Type

Description

workspace

Directory

File directory.

paths

List

File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing.

HDFS File Operator

The HDFS File operator is used to get a single file in a specified directory from HDFS.

Input Parameters Description

Name

Required/optional

Type

Description

data_source_name

Required

String

Data source name from the data source connection configuration.

file_path

Required

String

HDFS file path.

Output parameters description

Name

Type

Description

file

File

Output a single file gotten from HDFS.