What image labelling is and how to start your first machine learning dataset

A beginner-friendly introduction to image labelling, why it matters, and the simplest way to prepare your first dataset for a machine learning project.

  • 15 March 2026
  • 4 min read
  • BBoxML Team
What image labelling is and how to start your first machine learning dataset

If you are brand new to machine learning, image labelling is one of the first practical jobs you will run into. It sounds technical, but the idea is simple: you show a computer examples of what you want it to notice.

For an image model, those examples usually start with humans looking at pictures and marking the important things in them. That marking process is called image labelling or annotation.

If you want to move from the idea stage to a real dataset quickly, pair this guide with BBoxML's Getting Started flow. It turns the basics here into a clear next step: create a project, upload images, label a small batch, and prepare your first export.

What image labelling actually means

Imagine you want a model to spot dogs in photos.

You cannot just tell the computer "this is a dog" once and expect it to understand. You need to give it many examples. For each example image, you mark where the dog is and attach the correct label. Over time, the model learns patterns from those examples.

That means a labelled dataset is really just a teaching set:

  • the image is the example
  • the label says what matters in the image
  • the collection of many labelled images becomes training data

In BBoxML, one common way to do this is by drawing a bounding box around an object and assigning it a class name such as dog, cat, or car.

Why labelling matters so much

When people first hear about machine learning, they often focus on the model. In practice, beginners usually get better results by focusing on the dataset first.

If the labels are unclear, inconsistent, or incomplete, the model learns from messy teaching material. If the labels are accurate and consistent, the model has a much better chance of learning the right pattern.

This is why image labelling is not busywork. It is one of the most important parts of the whole project.

What a first project should look like

Your first machine learning dataset does not need to be large or complicated.

A good first project usually looks like this:

  1. Pick one simple task.
  2. Choose a small set of clear labels.
  3. Label a manageable batch of images.
  4. Export the results in a format your training workflow can use.

For example, you might start with:

  • one object type, such as dog
  • 50 to 200 images
  • a single rule for what should be boxed

That is enough to learn the workflow without getting buried in edge cases too early.

How to label images for the first time

If you are about to create your first dataset, this sequence works well:

1. Decide what the model should notice

Be specific. "Animals" is broad. "Dogs in outdoor photos" is much clearer.

The clearer the goal, the easier it is to decide what should and should not be labelled.

2. Write down your label rules

Before you start drawing boxes, decide the rules you will follow.

Examples:

  • Should partly hidden objects still be labelled?
  • Should very small objects be ignored?
  • Should blurry objects be included?

These decisions matter because consistency is often more important than perfection.

3. Keep your classes simple

Beginners often create too many labels too soon. Start with the smallest useful set.

Good starting approach Harder starting approach
dog small-dog, large-dog, puppy, running-dog, sleeping-dog
car sedan, hatchback, SUV, pickup, van

You can always add more detail later once the basic workflow is stable.

4. Label a small batch first

Do not wait until you have labelled thousands of images to review your work.

Label a small batch, then stop and check:

  • are the boxes placed consistently?
  • are class names clear?
  • are there confusing edge cases that need rules?

This quick review saves a lot of rework later.

Common beginner mistakes

Here are a few problems that show up again and again in first projects:

  • changing class names halfway through the dataset
  • labelling some difficult examples but skipping similar ones later
  • starting with too many categories
  • collecting images before deciding what "good" labels look like

None of these mistakes are unusual. They are part of the learning curve. The goal is simply to catch them early.

What happens after labelling

Once your images are labelled, the dataset can usually be exported into a standard format such as YOLO or COCO. That exported data is what a training pipeline or machine learning engineer will use next.

You do not need to master model training on day one. A strong first step is just this:

  • understand the problem you want to solve
  • label a small dataset consistently
  • export it cleanly

That is already real progress.

Ready to try the workflow on your own images? Start in onboarding, follow Getting Started, and use Billing & Credits if you want to estimate AI usage before you label a larger batch.

A good mindset for your first dataset

Your first dataset is not supposed to be perfect. It is supposed to teach you the workflow.

If you can explain:

  • what the model should detect
  • what each class means
  • how you decided what to label

then you are already doing the important work well.

Machine learning projects become much easier once the dataset has a clear structure. That is exactly why tools like BBoxML exist: to make the first part of the journey feel understandable, not overwhelming.