Home DATA SCIENCE Kaggle dataset import into colab notebook

Kaggle dataset import into colab notebook

Hello everyone, in this tutorial we will learn how to import the Kaggle dataset in google colab notebook. 1st of all just rename your notebook to any name which is suitable for your work or project.

Collaboratory is a Google collaboration initiative intended to help promote machine learning in education and science. It’s a Jupyter notebook system that doesn’t need much configurations and runs completely in the cloud.

Step 1: Install the dependencies

Lets start with install the dependencies first. It can be done by

!pip install Kaggle

and run this. Here you can see the output of installing Kaggle.

install kaggle dependencies

Step 2: Get Kaggle Api token

Now, you have to go to your Kaggle account. And in that Kaggle account, next, go to your ” my account” settings. From that “my account” settings, And go-to “create new API token”. By clicking on the link you have to download the Kaggle.json file on to your machine.

kaggle api key

Step 3:Import Kaggle api token in google colab notebook

Once you do that lets comeback to your google colab notebook. On that just type

from google.colab import files

files.upload()

upload log

and execute this command. It will come up with some dialog browser and select “choose files” option and select your previously downloaded “Kaggle.json” file from your machine. There you can see the uploading process at the output log cell.

output log

After 100% upload completion our next command will be

!mkdir -p ~/.kaggle

!cp kaggle.json ~/.kaggle

!chmod 600 ~/.kaggle/kaggle.json

Just execute the command.

Step 4: Get your dataset api command

If you want to import some Kaggle dataset, then you just go to the Kaggle websites and search for your dataset. Here I am working with a fruit image dataset so I select the fruit360 dataset, you choose as per your requirements. A huge number of datasets present in Kaggle. Just choose the dataset whose size is less than your google colab account storage. For me my available storage space is 70GB so I choose the dataset size of less than 70 GB. From your Kaggle dataset download page just click on “Copy API command”. Once that is copied go back to your Google colab notebook and paste the copied command and add an exclamation mark before the command. For me it is

!kaggle datasets download -d moltean/fruits

Next just execute this command.

Step 5: Extract the dataset

Once you do that Kaggle dataset will be downloaded to your google colab. Refresh the table of contents to see your dataset. But dataset will be in zip format. To work with the dataset we need to extract the data from that zip file.

refresh

To extract just paste the command given below and change the dataset name as per your dataset

“from zipfile import ZipFile

file_name=”fruits.zip” #Your Dataset name

with ZipFile(file_name,’r’) as zip:

                zip.extractall()

                print(‘Done’)”

The use of this code is to extract the dataset from the zip file. After executing this command you have to wait sometime depending on your dataset size and after completing the log window shows “Done”. Just refresh the table of contents to view your extracted dataset in google colab repository.

So, you have successfully learnt how to import Kaggle dataset in google colab notebook.

Step 6: Work with the dataset (Optional)

Now you are ready to work with your Kaggle dataset in your google colab. I am doing some pre-processing operation on my fruit image dataset. So for reference purposes just see how can we work with that dataset. Mainly I am calculating the average color value of any image and store in a variable. So I will do that for my whole dataset’s image.

First thing first,

import cv2, os, glob

Write it on your cell and execute it. After that just get your image dataset folder’s path. For that just goto the dataset folder and “right-click” on that, there you can see an option “ copy the path”, by clicking on that you will get your dataset path.

After getting the dataset’s path just run the below command to get your result.

img_dir = “/content/fruits-360/Test/Apple Braeburn” # Enter Directory of all images

data_path = os.path.join(img_dir,’*g’)

files = glob.glob(data_path)

data = []

for f1 in files:

    img = cv2.imread(f1)[:, :, :-1]

    average = img.mean(axis=0).mean(axis=0)

    data.append(average)

Thanks for reading. Feel free to like and share..

Please visit our facebook page to get more updates : pre-processing

Join our telegram group : pre-processing

Follow us on instagram : pre-processing

You may also like

Drone duel: Mavic Mini vs Mavic Air 2

“Bharat Drone” An ingenious inception by DRDO

“THOR”: The electromagnetic defender

LEAVE A REPLY

Please enter your comment!
Please enter your name here

- Advertisment -

Most Popular

Will “future Circular Collider” make the Standard Model obsolete?

In June, the group of Europe's leading particle physicists released their vision for the next few years of particle physics experiments in...

Consciousness and Quantum World

If I work hard, does my power of quantum mechanics cause me to manifest reality? No, but then why did any of...

Electronic Tattoos: An Emerging Future

Tattoos are the new sensation among the Millennial. According to a USA survey, the tattoo industry has been increasing by nearly 10...

Comprehensive guide towards getting cinematic shots in DJI Mavic mini

The DJI Mavic mini is tiny, but it can get some awesome footage, and I’m going to go through some tactics on how you...

Recent Comments