Uploading large or bulk files: Step-by-step instructions

By Lucas Laughery1, Aishwarya Y Puranam1, Sumudine Fernando1, Merve Usta1

1. Purdue University

Licensed according to this deed.

Published on

Abstract

Follow these instructions to upload large or bulk files to your datasets.

Files larger than 5GB cannot be uploaded using the Local Upload button for your dataset. To load large files into your dataset, you first need to transfer your files to datacenterhub using SFTP (Secure File Transfer Protocol). After your files are transferred to datacenterhub, you will be able to load them into your dataset using the Server Upload button.  Note that you can use SFTP to bring any files into your dataset, no matter the size. This makes it useful for uploading files in bulk. 

There are many methods for using SFTP to transfer your files to datacenterhub. Step-by-step instructions for two popular options are given below. 

Option 1: SFTP Client (e.g. FileZilla)

Before you begin, you will need to install a SFTP Client such as FileZilla or WinSCP. Download and install a client before proceeding.

Step What to do When to do it More details Screenshot (click to open full size)
Step 1 Enter setup information one time

Enter/Select the following information in the SFTP client

  • Host: datacenterhub.org
  • Port: 22
  • Protocol: SFTP-SSH File Transfer Protocol
  • User: Your datacenterhub.org username (example: apuranam)
  • Password: Your datacenterhub.org password
  • Click on Connect

You may also wish to save this configuration to avoid entering these credentials again.

Step 2 Accept host key   one time    

Click on OK in the dialog box that appears to accept the host key and connect.

Step 3 Change remote directory each time

Change the remote site in the following window to /db/tmp/username

where username is your datacenterhub.org username (example: apuranam)

Step 4 Upload files each time

Upload files to the folder on the server by dragging and dropping from local site.

Step 5 Assign files to datasets each time

Your files will now be visible on datacenterhub.org in all of your current datasets. They can be accessed using the “Server Upload” button.

Select the files you wish to assign to the experiment/case and click upload.

 

Option 2: Command Line

For non-Unix systems, you will need to download and install a command line SFTP tool such as PuTTY. The remainder of this tutorial is done using PuTTY on a Windows computer.

Step What to do When to do it More details Screenshot
Step 1 Connect each time

After opening a terminal or running PuTTY SFTP, type:

open username@datacenterhub.org

where username is your username. Press enter to continue. You will be prompted to enter your password. Do so and press enter to connect.

Opening SFTP connection

Step 2 Change remote directory each time

By default, your remote directory will be your datacenterhub home directory.
You will need to change this to the “big data upload directory.” Enter the following command to do so:

cd /db/tmp/yourusername
Step 3 Change local directory each time

In this example, the files that we wish to upload are in C:%5CData%5C as shown to the side. Files can always be uploaded by entering the full file path and name (i.e. C:%5CData%5Csamplefile1.txt). However, it is often simpler first to change your local working directory. First, check your local present working directory using following command:

lpwd

Our current directory is different than the directory where our files are located, so we will change our local directory using the following command:

lcd newdirectory

(Note: In your case, newdirectory will be the directory where your files are located.)

Step 4  Upload files each time

Files can be transferred using a variety of commands. For a single file, use the put command:

 put samplefile1.txt

For multiple files, use the mput command:

mput samplefile2.txt samplefile3.txt

You can also use wildcard operators to upload all files with a given extension (for example, text files):

mput *.txt

Step 5 Assign files to datasets each time

Your files will now be visible on datacenterhub.org in all of your current datasets. They can be accessed using the “Server Upload” button.

Select the files you wish to assign to the experiment/case and click upload.

Cite this work

Researchers should cite this work as follows:

  • Lucas Laughery; Aishwarya Y Puranam; Sumudine Fernando; Merve Usta (2016), "Uploading large or bulk files: Step-by-step instructions," https://datacenterhub.org/resources/389.

    BibTex | EndNote

Tags