上传文件 | 大装置帮助中心
跳到主要内容

上传文件

上传文件流程说明

以下接口用于本地文件的文件上传,都需要先通过接口获取预签名URL,再通过预签名URL上传本地文件。以下可以分为普通上传和分片上传两种方式

普通上传

使用批量预签名上传文件URL上传文件,预签名URL允许客户端直接将文件上传到云存储,而无需经过中间的服务器。

以下是上传文件的详细步骤:

1)获取预签名的URL

使用批量预签名上传文件URL接口获取多个文件的上传预签名URL

curl 'https://aidmp.cn-sh-01.sensecoreapi.cn/studio/rag/data/v1/jobs/b1d6104abf6b46288fd66439dd6cdbab/files:batchPresign' \
-H 'authorization: Bearer eyJhbGciOiJSUz***'\
--data-raw '{"job_id":"b1d6104abf6b46288fd66439dd6cdbab","rel_path":["base/api/README.md", "test.sh"]}'

其中rel_path是文件上传到知识库中的目标路径

响应示例

{
"result": {
"base/api/README.md": "https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_f269a87d61de42508a4c8d3ed0095e56/jobs/b1d6104abf6b46288fd66439dd6cdbab/base/api/README.md?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=7A7C633FEA733228360F10AEC8B9FBF3%2F20250102%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250102T094849Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&X-Amz-Signature=917067634b90308340bf109faf8faff6c13f889f5505a4a938f4f7c77a329a8c",
"test.sh": "https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_f269a87d61de42508a4c8d3ed0095e56/jobs/b1d6104abf6b46288fd66439dd6cdbab/test.sh?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=7A7C633FEA733228360F10AEC8B9FBF3%2F20250102%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250102T094849Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&X-Amz-Signature=8540f54a85f224614c392fc901c121db23ca4bef49b7ca11173da803309f2814"
}
}

2)上传文件

根据接口返回的预签名URL,有多种方式可以将本地文件上传

✔ 使用 curl 上传文件

curl -X PUT -T local/test_txt.txt \
"https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_f269a87d61de42508a4c8d3ed0095e56/jobs/b1d6104abf6b46288fd66439dd6cdbab/base/api/README.md?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=7A7C633FEA733228360F10AEC8B9FBF3%2F20250102%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250102T094849Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&X-Amz-Signature=917067634b90308340bf109faf8faff6c13f889f5505a4a938f4f7c77a329a8c"
  • local/test_txt.txt: 本地文件的路径,确保这个路径指向你要上传的文件
  • https://aoss.cn...: 批量预签名上传文件URL接口获取的预签名URL

✔ 使用 Python 的 requests 库上传文件

import requests

def upload_file(file_path, presigned_url):
"""
上传文件到指定的预签名 URL。
"""
with open(file_path, "rb") as f:
response = requests.put(presigned_url, data=f)
response.raise_for_status()

✔ 使用 Go的 net/http 库上传文件

package main

import (
"fmt"
"io"
"net/http"
"os"
)

func uploadFile(filePath, presignedURL string) error {
// 打开文件
file, err := os.Open(filePath)
if err != nil {
return fmt.Errorf("failed to open file: %w", err)
}
defer file.Close()

// 获取文件信息
fileInfo, err := file.Stat()
if err != nil {
return fmt.Errorf("failed to get file info: %w", err)
}

// 创建 HTTP 请求
req, err := http.NewRequest("PUT", presignedURL, file)
if err != nil {
return fmt.Errorf("failed to create request: %w", err)
}

// 设置 Content-Length
req.ContentLength = fileInfo.Size()

// 设置必要的 Headers
req.Header.Set("Content-Type", "application/octet-stream")

// 执行请求
client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
return fmt.Errorf("failed to execute request: %w", err)
}
defer resp.Body.Close()

// 检查响应状态
if resp.StatusCode != http.StatusOK {
body, _ := io.ReadAll(resp.Body)
return fmt.Errorf("upload failed, status: %s, body: %s", resp.Status, body)
}

fmt.Println("File uploaded successfully!")
return nil
}

func main() {
// 文件路径
filePath := "/Users/konglingzhi/base/app/test_txt.txt"

// 预签名 URL
presignedURL := "https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_f269a87d61de42508a4c8d3ed0095e56/jobs/b1d6104abf6b46288fd66439dd6cdbab/base/api/README.md?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=7A7C633FEA733228360F10AEC8B9FBF3%2F20250102%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250102T094849Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&X-Amz-Signature=917067634b90308340bf109faf8faff6c13f889f5505a4a938f4f7c77a329a8c"

// 上传文件
if err := uploadFile(filePath, presignedURL); err != nil {
fmt.Printf("Error uploading file: %v\n", err)
}
}

注意事项:

  • 预签名URL的有效期:预签名URL通常有时间限制,需在有效期内使用。
  • 文件大小限制:确保上传的文件大小符合预签名URL的限制。

通过以上步骤,你可以使用预签名URL将文件上传到指定的云存储服务。

分片上传

使用预签名大文件分片上传的URL接口获取到预签名的大文件分片上传的URL,并通过以下步骤完成文件的上传。通常,预签名URL允许客户端直接向云存储服务(如AWS S3、阿里云OSS等)上传文件分片,而无需额外的身份验证。

1)获取大文件分片上传的预签名的URL

使用预签名大文件分片上传的URL获取每个分片的预签名URL。响应中会返回多个预签名的URL,每个URL对应一个文件分片。

假设要上传两个大文件,需要请求两次分片预签名URL,以下是接口请求示例:

curl 'https://aidmp.cn-sh-01.sensecoreapi.cn/studio/rag/data/v1/jobs/13ea23cc2fc14452b26682f2e1f7f577/files:presignMultipartUploadFileUrl' \
-H 'authorization: Bearer eyJhbGciOi...' \
--data-raw '{
"job_id":"13ea23cc2fc14452b26682f2e1f7f577",
"relpath":"test1.jsonl","file_size":15042941
}'
curl 'https://aidmp.cn-sh-01.sensecoreapi.cn/studio/rag/data/v1/jobs/13ea23cc2fc14452b26682f2e1f7f577/files:presignMultipartUploadFileUrl' \
-H 'authorization: Bearer eyJhbGciOi...' \
--data-raw '{
"job_id":"13ea23cc2fc14452b26682f2e1f7f577",
"relpath":"GLM-4-Instruct-4K-zh-openai.jsonl",
"file_size":12682527
}'

其中rel_path是文件上传到知识库中的目标路径, file_size是文件大小

响应示例

{
"upload_id": "0",
"list": [
{
"uri": "https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_6a53d6ce2ae74633b2e52361286c53ad/jobs/13ea23cc2fc14452b26682f2e1f7f577/Claude3-Opus-Multi-Instruct-5K-openai.jsonl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=CD840429150E4FD2805997FE12D18A86%2F20250103%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250103T032922Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&partNumber=1&uploadId=0&X-Amz-Signature=fbb6285576b7317f8d35a319339b09d8624f9f9f2182d3c3d9f71ede898ad1db",
"part_num": "1",
"part_size": "10485760",
"file_offset": "0"
},
{
"uri": "https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_6a53d6ce2ae74633b2e52361286c53ad/jobs/13ea23cc2fc14452b26682f2e1f7f577/Claude3-Opus-Multi-Instruct-5K-openai.jsonl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=CD840429150E4FD2805997FE12D18A86%2F20250103%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250103T032922Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&partNumber=2&uploadId=0&X-Amz-Signature=3252a4a0367ded96245ecbbfba23e0a7e4026f5ee28a624630ea84d9510f31cc",
"part_num": "2",
"part_size": "4557181",
"file_offset": "10485760"
}
]
}
{
"upload_id": "0",
"list": [
{
"uri": "https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_6a53d6ce2ae74633b2e52361286c53ad/jobs/13ea23cc2fc14452b26682f2e1f7f577/GLM-4-Instruct-4K-zh-openai.jsonl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=CD840429150E4FD2805997FE12D18A86%2F20250103%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250103T032922Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&partNumber=1&uploadId=0&X-Amz-Signature=59dcac6b639cc65d634633ccb22efabeb4379e7dbc93ee9ac5effe8deed1bfb3",
"part_num": "1",
"part_size": "10485760",
"file_offset": "0"
},
{
"uri": "https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_6a53d6ce2ae74633b2e52361286c53ad/jobs/13ea23cc2fc14452b26682f2e1f7f577/GLM-4-Instruct-4K-zh-openai.jsonl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=CD840429150E4FD2805997FE12D18A86%2F20250103%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250103T032922Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&partNumber=2&uploadId=0&X-Amz-Signature=56141b77a656ed7d4c439edec54c86435ab3c277e066c537eaae7483ddf6330a",
"part_num": "2",
"part_size": "2196767",
"file_offset": "10485760"
}
]
}

2)本地文件上传

✔ 使用 curl 上传文件

#!/bin/bash

# 定义分片信息
file="/Users/***/data/GLM-4-Instruct-4K-zh-openai.jsonl"
declare -a parts=(
"https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_6a53d6ce2ae74633b2e52361286c53ad/jobs/13ea23cc2fc14452b26682f2e1f7f577/GLM-4-Instruct-4K-zh-openai.jsonl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=CD840429150E4FD2805997FE12D18A86%2F20250103%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250103T032922Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&partNumber=1&uploadId=0&X-Amz-Signature=59dcac6b639cc65d634633ccb22efabeb4379e7dbc93ee9ac5effe8deed1bfb3 0 10485760"
"https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_6a53d6ce2ae74633b2e52361286c53ad/jobs/13ea23cc2fc14452b26682f2e1f7f577/GLM-4-Instruct-4K-zh-openai.jsonl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=CD840429150E4FD2805997FE12D18A86%2F20250103%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250103T032922Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&partNumber=2&uploadId=0&X-Amz-Signature=56141b77a656ed7d4c439edec54c86435ab3c277e066c537eaae7483ddf6330a 10485760 2196767"
)

# 遍历分片上传
for part in "${parts[@]}"; do
IFS=' ' read -r url offset size <<< "$part"
echo "Uploading part with offset $offset and size $size to $url"

curl -X PUT \
-H "Content-Length: $size" \
--data-binary @<(dd if="$file" bs=1 skip="$offset" count="$size") \
"$url"

if [ $? -ne 0 ]; then
echo "Failed to upload part at offset $offset"
exit 1
fi
done

echo "All parts uploaded successfully."

  • `file="*"`**: 本地文件的路径,确保这个路径指向你要上传的文件
  • parts=():每个元素包含接口返回的URL、偏移量、分片大小
  • 使用 IFS 分割字符串,提取 url、offset 和 size。
  • 使用 curl 执行上传,读取文件的指定部分。

返回示例:

HTTP/1.1 200 OK
Server: nginx
Date: Fri, 03 Jan 2025 08:24:23 GMT
Content-Length: 0
Accept-Ranges: bytes
Content-Security-Policy: block-all-mixed-content
Etag: "9377b927a7a9b229a7bf4a1ca81a0c34"
Strict-Transport-Security: max-age=31536000; includeSubDomains
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
Strict-Transport-Security: max-age=31536000; includeSubDomains; preload

✔ 使用 Python 的 requests 库上传文件

import os
import requests

def upload_part(uri, data):
"""
上传分片文件到指定的 URI。
"""
headers = {"Content-Length": str(len(data))}
response = requests.put(uri, data=data, headers=headers)

if response.status_code != 200:
raise Exception(f"Upload failed: {response.status_code}, {response.text}")

print(f"Uploaded part successfully to {uri}")

def upload_file(file_path, parts):
"""
分片上传文件。
:param file_path: 本地文件路径
:param parts: 分片信息列表,包含 URI、偏移量和大小
"""
with open(file_path, "rb") as f:
for part in parts:
# 移动文件指针到指定偏移量
f.seek(part["file_offset"])
# 读取分片大小的数据
data = f.read(part["part_size"])

# 上传分片
print(f"Uploading part {part['part_num']}...")
upload_part(part["uri"], data)

if __name__ == "__main__":
# 示例分片信息
parts = [
{
"uri": "https://aoss.cn-sh-01.sensecoreapi-oss.cn/...",
"part_num": "1",
"part_size": 10485760,
"file_offset": 0
},
{
"uri": "https://aoss.cn-sh-01.sensecoreapi-oss.cn/...",
"part_num": "2",
"part_size": 2196767,
"file_offset": 10485760
}
]

# 文件路径
file_path = "largefile.jsonl"

try:
upload_file(file_path, parts)
print("All parts uploaded successfully.")
except Exception as e:
print(f"Error: {e}")

✔ 使用 Go的 net/http 库上传文件

package main

import (
"bytes"
"fmt"
"io"
"net/http"
"os"
)

func uploadPart(uri string, filePath string, offset, size int64) error {
// 打开文件
file, err := os.Open(filePath)
if err != nil {
return err
}
defer file.Close()

// 定位到分片起始位置
_, err = file.Seek(offset, io.SeekStart)
if err != nil {
return err
}

// 读取分片内容
buffer := make([]byte, size)
_, err = io.ReadFull(file, buffer)
if err != nil {
return err
}

// 发起 HTTP PUT 请求
req, err := http.NewRequest("PUT", uri, bytes.NewReader(buffer))
if err != nil {
return err
}
req.Header.Set("Content-Length", fmt.Sprintf("%d", size))

client := &http.Client{}
resp, err := client.Do(req)
if err != nil {
return err
}
defer resp.Body.Close()

if resp.StatusCode != http.StatusOK {
return fmt.Errorf("upload failed with status: %s", resp.Status)
}

fmt.Println("Uploaded part successfully:", uri)
return nil
}

func main() {
parts := []struct {
URI string
Offset int64
Size int64
}{
{
URI: "https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_6a53d6ce2ae74633b2e52361286c53ad/jobs/13ea23cc2fc14452b26682f2e1f7f577/GLM-4-Instruct-4K-zh-openai.jsonl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=CD840429150E4FD2805997FE12D18A86%2F20250103%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250103T032922Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&partNumber=1&uploadId=0&X-Amz-Signature=59dcac6b639cc65d634633ccb22efabeb4379e7dbc93ee9ac5effe8deed1bfb3",
Offset: 0,
Size: 10485760,
},
{
URI: "https://aoss.cn-sh-01.sensecoreapi-oss.cn/rag-system/kn/datasets/rag_6a53d6ce2ae74633b2e52361286c53ad/jobs/13ea23cc2fc14452b26682f2e1f7f577/GLM-4-Instruct-4K-zh-openai.jsonl?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=CD840429150E4FD2805997FE12D18A86%2F20250103%2Fdefault%2Fs3%2Faws4_request&X-Amz-Date=20250103T032922Z&X-Amz-Expires=7200&X-Amz-SignedHeaders=host&partNumber=2&uploadId=0&X-Amz-Signature=56141b77a656ed7d4c439edec54c86435ab3c277e066c537eaae7483ddf6330a",
Offset: 10485760,
Size: 2196767,
},
}

for _, part := range parts {
err := uploadPart(part.URI, "GLM-4-Instruct-4K-zh-openai.jsonl", part.Offset, part.Size)
if err != nil {
fmt.Println("Error uploading part:", err)
}
}
}

3) 完成分片上传

所有分片上传完成后,需要调用完成大文件分片上传来通知服务端合并这些分片。

curl 'https://aidmp.cn-sh-01.sensecoreapi.cn/studio/rag/data/v1/jobs/13ea23cc2fc14452b26682f2e1f7f577/files:completeMultipartUploadFile' \
-H 'authorization: Bearer eyJhbGciOi...' \
--data-raw '{
"job_id":"13ea23cc2fc14452b26682f2e1f7f577",
"upload_id":"0",
"relpath":"Claude3-Opus-Multi-Instruct-5K-openai.jsonl",
"list":[
{
"part_num":"1",
"etag":"\"863ded524478b3cb3471804cfa779ce1\""
},
{
"part_num":"2",
"etag":"\"d1f2db7422a7b0d2159487eea22749ab\""
}
]
}'

在这个示例中:

  • upload_id 是上传ID,由预签名URL接口返回。
  • list 中列出了所有上传的分片信息(etagpart_num)。
  • etag 每次上传分片成功后,云存储服务会返回一个 ETag 值,需在完成分片上传时提供该值

注意事项: 上传分片时,请确保顺序正确,并在完成上传时按照顺序提交。

通过以上步骤,文件将被成功上传,并在服务器端合并为完整的文件。