Skip to content

Capturing the output from ffmpeg

2010 August 20
by derrick

I needed to collect the output from ffmpeg for some profiling. It proved more challenging than I anticipated as ffmpeg writes the data unflushed to stderr making it unreadable using stdio. To get the data the stderr file descriptor has to set to NONBLOCK using fcntl. Here is the resulting Python code.

def encode(filename, callback=None):
    cmd = 'ffmpeg -i "%s" -acodec libfaac -ab 128kb ' + \
          '-vcodec mpeg4 -b 1200kb -mbd 2 -flags +4mv ' + \
          '-trellis 2 -cmp 2 -subcmp 2 -s 320x180 "%s.mp4"'
    pipe = subprocess.Popen(
        shlex.split(cmd % (filename, os.path.splitext(filename)[0])),
        stderr=subprocess.PIPE,
        close_fds=True
    )
    fcntl.fcntl(
        pipe.stderr.fileno(),
        fcntl.F_SETFL,
        fcntl.fcntl(pipe.stderr.fileno(), fcntl.F_GETFL) | os.O_NONBLOCK,
    )
    # frame=   29 fps=  0 q=2.6 size=     114kB time=0.79 bitrate=1181.0kbits/s
    reo = re.compile(r"""\S+\s+(?P<frame>>d+)  # frame
                         \s\S+\s+(?P<fps>\d+)           # fps
                         \sq=(?P<q>\S+)                    # q
                         \s\S+\s+(?P<size>\S+)          # size
                         \stime=(?P<time>\S+)           # time
                         \sbitrate=(?P<bitrate>[\d\.]+) # bitrate
                         """, re.X)
    while True:
        readx = select.select([pipe.stderr.fileno()], [], [])[0]
        if readx:
            chunk = pipe.stderr.read()
            if chunk == '':
                break
            m = reo.match(chunk)
            if m and callback:
                callback(m.groupdict())
        time.sleep(.1)

The complete script is located here.

Learning Erlang

2010 August 2
by derrick

I have been really loving Python lately but after reading this post. I thought it would be a good idea to check out Erlang. I have heard its concurrency and network support is out of this world and being in a knowledge based industry extra knowledge never hurts. This simple code snippet shows how much Erlang differs from the traditional procedural languages.

average(X) -> sum(X) / len(X).
sum([H|T]) -> H + sum(T);
sum([]) -> 0.
len([_|T]) -> 1 + len(T);
len([]) -> 0.

average takes at list X who calls sum and len. Both of those are recursive functions that split the list into the first element H and the remainder T. Variables must start with a capital letter and the ‘_’ denotes the result is not used. Notice in this example no temporary variables were used. Talk about putting the “f” in functional. I can’t wait to get to the concurrent stuff.

Upload files to S3 with progressbar

2010 July 28
by derrick

Here is a script I wrote to upload files to S3. It differs from s3put in that there is a progress indicator showing the ETA and the percentage uploaded and a summary after the transfer is complete.



To install it run these commands

sudo pip install boto progressbar
git clone git://gist.github.com/510222.git s3upload-gist
cd s3upload-gist
vim s3upload # Add your AWS settings
chmod 755 s3upload-gist/s3upload
sudo mv s3upload-gist/s3upload /usr/local/bin
rm -rf s3upload-gist

Django Facebook Authentication Backend

2010 July 15
by derrick

Here is a Django authentication backend I wrote using Facebook’s amazingly simple Graph API. It logs the user in using their Facebook credentials so you site doesn’t have to worry about creating user profiles, validating, etc. See

http://developers.facebook.com/docs/authentication/
http://developers.facebook.com/docs/authentication/permissions
http://developers.facebook.com/docs/api
http://github.com/facebook/python-sdk/blob/master/examples/oauth/facebookoauth.py

Define the facebook tokens in settings.py and replace with the name of your app. You will probably want to modify the scope on the authorize link in the template, see the authentication permissions link.

Simple python syslog wrapper class

2010 July 15
by derrick

I just wanted a simple wrapper around syslog. The Python logging module is good but it was too heavyweight for what I needed. Here is simple logging class for syslog. It has an optional decorator to provide the function name to syslog which I find useful for debugging.

Python function runtime logging decorator

2010 July 15
by derrick

When profiling it can be useful to log the amount of time that is spent in a function. With Python that is super easy to do with decorators.

#!/usr/bin/python

import time
import syslog

def logtime(func):
    def caller(*args, **kwargs):
        stime = time.time()
        ret = func(*args, **kwargs)
        syslog.syslog(
            syslog.LOG_LOCAL2 | syslog.LOG_INFO,
            '%s=%s\n' % (func.__name__, time.time() - stime))
        return ret
    return caller

@logtime
def test_func(arg1, arg2=None):
    print arg1, arg2
    time.sleep(1)

if __name__ == '__main__':
    test_func(1, 2)

logtime will log the time spent in the function to syslog.

Jul 14 15:05:01 olomai python: test_func=1.00114893913

Comparison of IN, GROUP BY and COUNT using Hibernate, Django and SQLAlchemy

2010 July 7
by derrick

The other day I wrote about how to do a IN and GROUP BY query using Java’s de facto ORM, Hibernate. I thought it would be interesting to see how other ORMs handled the same query. This is the query I want to generate:

SELECT COUNT(*),state FROM download_request WHERE id IN (<id list>) GROUP BY state;

Below is the code, output and SQL generated for the three ORMs.

Hibernate

class HibernateDAO implements ApplicationDAO {
public Map getStateCounts(final Collection ids) {
  HibernateSession hibernateSession = new HibernateSession();
  Session session = hibernateSession.getSession();
  Criteria criteria = session.createCriteria(DownloadRequestEntity.class)
	.add(Restrictions.in("id", ids));
  ProjectionList projectionList = Projections.projectionList();
  projectionList.add(Projections.groupProperty("state"));
  projectionList.add(Projections.rowCount());
  criteria.setProjection(projectionList);
  List results = criteria.list();
  Map stateMap = new HashMap();
  for(Object[] obj: results) {
      	DownloadState downloadState = (DownloadState)obj[0];
       	stateMap.put(downloadState.getDescription().toLowerCase(), (Integer)obj[1]);
  }
  hibernateSession.closeSession();
  return stateMap;
}
public static void main(String args[]) {
    HibernateDAO downloadRequestDAO = new HibernateDAO();
    Collection ids = new ArrayList();
    for (int i = 1000;  i < 1010; i++ )
        ids.add(i);
    Map stateCounts =  downloadRequestDAO.getStateCounts(ids);
    for (String state: stateCounts.keySet()) {
        System.out.println(state + ": " + stateCounts.get(state));
    }
}
}

Output

failed: 5
downloaded: 1
completed: 4

SQL

select this_.state as y0_, count(*) as y1_ from download_request this_
where this_.id in (1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009)
group by this_.state

Django

counts = models.DownloadRequest.objects.filter(
    id__in=range(1000, 1010),
).values('state').annotate(Count('state'))
for count in counts:
    print count

Output

{'state': u'FAILED', 'state__count': 5}
{'state': u'COMPLETED', 'state__count': 4}
{'state': u'DOWNLOADED', 'state__count': 1}

SQL

SELECT `download_request`.`state`, COUNT(`download_request`.`state`)
AS `state__count` FROM `download_request`
WHERE `download_request`.`id` IN (1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009)
GROUP BY `download_request`.`state` ORDER BY NULL

SQLAlchmey

query = session.query(
    func.count(DownloadRequest.state), DownloadRequest.state,
).filter(
    DownloadRequest.id.in_(range(1000,1010)),
).group_by(DownloadRequest.state)
for count in query.all():
    print count

Output

(4L, 'COMPLETED')
(1L, 'DOWNLOADED')
(5L, 'FAILED')

SQL

SELECT count(download_request.state) AS count_1, download_request.state
AS download_request_state FROM download_request
WHERE download_request.id IN (1000, 1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009)
GROUP BY download_request.state

As you can see SQLAlchemy is the most similar to SQL, django’s is the briefest and Hibernate (obviously) is the most Java-like. Of the three I’d say I like SQLAlchemy the best as it is the most similar to SQL and me being from an SQL background it is the most natural. However all three get the job done and it is always great to have options.

IN, GROUP BY and COUNT with Hibernate

2010 July 2
by derrick

I needed to make the following SQL query with Hibernate

SELECT COUNT(*),state FROM download_request WHERE id IN (<id list>) GROUP BY state;

and being new to Hibernate it came out a lot differently than how I thought it would. To perform the IN query a Criteria query needs to be created

Criteria criteria = session.createCriteria(DownloadRequestEntity.class)
    .add(Restrictions.in("id", ids))

For the count and order by a Projection needs to be added to the criteria

ProjectionList projectionList = Projections.projectionList();
projectionList.add(Projections.groupProperty("state"));
projectionList.add(Projections.rowCount());
criteria.setProjection(projectionList);

This is the resulting code

public Map getStateCounts(final Collection ids) {
  HibernateSession hibernateSession = new HibernateSession();
  Session session = hibernateSession.getSession();
  Criteria criteria = session.createCriteria(DownloadRequestEntity.class)
	.add(Restrictions.in("id", ids));
  ProjectionList projectionList = Projections.projectionList();
  projectionList.add(Projections.groupProperty("state"));
  projectionList.add(Projections.rowCount());
  criteria.setProjection(projectionList);
  List results = criteria.list();
  Map stateMap = new HashMap();
  for(Object[] obj: results) {
      	DownloadState downloadState = (DownloadState)obj[0];
       	stateMap.put(downloadState.getDescription().toLowerCase(), (Integer)obj[1]);
  }
  hibernateSession.closeSession();
  return stateMap;
}

Something completely different from what I expected. That’s what I love about solving problems sometimes the solution is something you might never expect.

Init script for fastcgi and php on Ubuntu

2010 July 2
by derrick

This is an init script to run spawn-fcgi and php on Ubuntu. Its adapted from Aaron Schaefer’s excellent post on how to run wordpress on nginx – the configuration this site runs on.

To install it follow the instructions below

git clone git://gist.github.com/510245.git fastcgi-php.gist
vim fastcgi-php.gist/fastcgi-php # Update with your pathnames
chmod 755 fastcgi-php.gist/fastcgi-php
mv fastcgi-php.gist/fastcgi-php /etc/init.d/
update-rc.d fastcgi-php defaults
/etc/init.d/fastcgi-php start
rm -rf fastcgi-php.gist