Listen to the song. Study the timing of the music, as well as the lyrics and their meaning. Music is often symbolic, or metaphorical, so take this into account.
Decide whether the video is going to be an interpretation of the song, or a performance piece. In a performance piece, the artist is basically going to perform the song as if they were doing it live in concert. Scripts are rarely needed for performance pieces.
Write the script/treatment based upon your interpretation of the music. A good way to do this is to think of the song structurally. In the average song there will be at least 2 verses, 3 choruses, a middle count, or breakdown, and possibly a solo or something similar.
Write down the separate movements of the song. A music video usually has at least 3 and often up to 7 or more "set up's." The "set up's" are the camera shots that the viewer will see inter-cut throughout the music video.
Write the song movements as you envision the music video being seen. Think of the images, settings, costumes and locations that the audience will see. Imagine all of these things in sync with the music and lyrics of song.
Write the script, intermingling the descriptions of visual elements with the specific moments of the music, as it will be heard. The script should be written the same as any film or television script. Use the proper format, including scene headings such as: INT. or EXT and proper transitions, such as: CUT TO:, FADE IN:, FADE OUT: and DISSOLVE TO:'s.