File Source Operators
The MI Pipelines provides the following file s source operators based on Git and HDFS, which can be used to get files or directories:
- Git Directory Operator
- Git File Operator
- HDFS Directory Operator
- HDFS File Operator
- HDFS Uploader Operator
Git Directory Operator
The Git Directory operator is used to get all the files in the directory from the Git directory. It is often used as a pre-operator for Shell, Python, Notebook and other operators to provide the required code files. For example:
Output Parameters Description
Name |
Type |
Description |
workspace |
Directory |
Directory where the file is located (minio), which is of directory type, and is used to output the directories and files in paths in the form of workspace. |
paths |
List |
File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing. |
Git File Operator
The Git File operator is used to get a specified single file from the Git warehouse for the input of other operators.
Input Parameters Description
Name |
Required/Optional |
Type |
Description |
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
project |
Required |
String |
Git project name. |
branch |
Required |
String |
Git branch name. |
file_path |
Required |
String |
File path. |
Output Parameters Description
Name |
Type |
Description |
file |
File |
Output a single file pulled from Git. |
HDFS Directory Operator
The HDFS Directory operator is used to get one or more files in a specified directory from HDFS.
Input Parameters Description
Name |
Required/Optional |
Type |
Description |
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
file_paths |
Required |
List |
HDFS file path list. |
Output Parameters Description
Name |
Type |
Description |
workspace |
Directory |
File directory. |
paths |
List |
File path list (in list format), which can be used for subsequent operators to traverse the list files for alternate processing. |
HDFS File Operator
The HDFS File operator is used to get a single file in a specified directory from HDFS.
Input Parameters Description
Name |
Required/Optional |
Type |
Description |
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
file_path |
Required |
String |
HDFS file path. |
Output Parameters Description
Name |
Type |
Description |
file |
File |
Output a single file gotten from HDFS. |
HDFS Uploader Operator
The HDFS Uploader is used to upload a specified file to a specified HDFS directory, which does not have output parameters.
Input Parameters Description
Name |
Required/Optional |
Type |
Description |
data_source_name |
Required |
String |
Data source name from the data source connection configuration. |
file |
Optional |
file |
The file needs to be uploaded, which can be obtained using other file source operators such as Git operator or HDFS operator. |
filename |
Optional |
file |
The new file name after the file is uploaded. |
directory |
Optional |
Directory |
Current path of the file. |
dest |
Optional |
String |
Destination path of the file. |
overwrite |
Optional |
Boolean |
- Specify whether to overwrite the file with the same name in the destination folder.
- Select
true to overwrite
- Select
false to prevent overwriting
|