find, xargs and spaces

I commonly use the following (or something close to it) to rip through third-party source to find something I’m interested in:

find . -name “*.java” -type f | xargs grep someMethod

The problem that I usually run into is that directories will have spaces in the names (xargs treats spaces as delimiters). The trick to getting around this is the following:

find . -name “*.java” -type f -print0 | xargs -0 grep someMethod

The -print0 on find will use null to separate filenames and the -0 on xargs will read them. This article has more information.

And for those of you that are wondering why I have spaces in directories / filenames, A) it’s 2005 people, B) I’m developing on Windows where it’s more common and C) I’m looking at third-party source (i.e. go b*tch to someone else).

Advertisements

8 comments

  1. Define “unsafe”! Am I going to lose an arm? Is my wife going to be abducted? Is it going to wipe my hard drive?!?
    The “-exec” link in “find” is orders of magnitude slower than “xargs” for what I use it for. Unless “unsafe” means that lasers are going to shoot out of my mouse and burn my eye balls out, I think that I’m OK.
    Let us know!

  2. maybe youll be working at a client site and execute something malicious (think of the irssi problems in 2003 just from stdout in terminals) and then have to sell off your arm or sell your wife to pay for the lawsuit….or you could just think of it as safe computing practices…though i see youre the type of developer who isnt really aware of these things

  3. I hate to be negative in comments but you’re touching a nerve here. What you’re telling me is that because some operation has the potential for causing harm in some limited and identifiable domain that I should use it in no other — even in those where it can have no impact.
    So by your logic, I should not use malloc since it’s known to cause memory leaks which leads to computer crashes or I shouldn’t increment pointers since there’s a chance that I might run past then end of a block and cause a seg fault. These are unsafe computing practices, right? Hell, I shouldn’t even turn on my computer since it’s possible for someone to hack in and use it for a platform for causing harm to other computers — computers are *very* unsafe. I shouldn’t turn on lights since they might spark and cause fires — home wiring is very unsafe. I shouldn’t exhale since I expel carbon dioxide which is contributing to global warming and that’s unsafe. Maybe I should just off myself and save everyone the problem but then, oh crap, my decomposing body will release methane and other noxious gasess into the air and that’s unsafe. I guess I’m just not aware of what I’m doing. Not only am I *not* the “type of developer who isnt really aware of these things”, I’m not the type of human being that is really aware of these things.
    Thank you for pointing out my ignorance and incompetence even though I *clearly* identifed the domain in which I use this highly unsafe and should-be-stricken-from-the-earth technology and that domain has no chance of ever causing an impact.
    Yes, my friend, even the most unsafe practices *must* be accompanied with a domain in which they are either safe to use (i.e. cause no undesired impact) or are unsafe. You can’t simply make a statement that says “XYZ is very unsafe”.
    You might simply have said to the effect of “xargs(1) is not recommended in environments where there is the potential for executing malicious code that might cause a negative impact (such as at a client’s site) since there are known cases XYZ where it has been shown to be possible”. This is what is commonly referred to as a “well laid out argument” it has a statement of concern and its outcome, a domain of applicability, and supporting evidence.
    Thank you for your time and concern but please peddle your goods elsewhere.

  4. I think the commenter was trying to point out that calling exec(3) from find(1) is a bit safer because you’re still stuck in the find(1) memory space, forked and not pipelined. It gets tricky though: what operating platform was the commenter discussing? Eh, they prolly won’t check here again, but pipelining on some platforms might be better off if you have stringent constraints on what any program making system calls to the kernel can do (see OpenBSD’s systrace(4)).
    Some machines hosted by monkey.org were claimed to have been hacked through some code execution in the actual TTY (from stdout) years ago. For that reason, systrace(4) was developed by dugsong. How this relates to your post, I don’t know…this is just rambling now.
    p.s. s/incompitence/incompetence/g đŸ™‚

  5. The one good thing about firefox 2.0 is spellcheck…
    …when the browser isn’t using up 100TB of memory or trying to restore non-existent sessions or not crashing or not rendering xml too strictly or….


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s